The R Journal: accepted article

This article will be copy edited and may be changed before publication.

Working with CRSP/COMPUSTAT in R: Reproducible Empirical Asset Pricing PDF download
Majeed Simaan

Abstract It is common to come across SAS or Stata manuals while working on academic empirical finance research. Nonetheless, given the popularity of open-source programming languages such as R, there are fewer resources in R covering popular databases such as CRSP and COMPUSTAT. The aim of this article is to bridge the gap and illustrate how to leverage R in working with both datasets. As an application, we illustrate how to form size-value portfolios with respect to Fama and French (1993) and study the sensitivity of the results with respect to different inputs. Ultimately, the purpose of the article is to advocate reproducible finance research and contribute to the recent idea of “Open Source Cross-Sectional Asset Pricing”, proposed by Chen and Zimmermann (2020).

Received: 2020-10-30; online 2021-06-08
CRAN packages: data.table, lubridate, ggplot2, parallel, plyr, dplyr, rollRegres
CRAN Task Views implied by cited CRAN packages: TimeSeries, Databases, Finance, Graphics, HighPerformanceComputing, ModelDeployment, Phylogenetics, ReproducibleResearch, TeachingStatistics

CC BY 4.0
This article is licensed under a Creative Commons Attribution 4.0 International license.

  author = {Majeed Simaan},
  title = {{Working with CRSP/COMPUSTAT in R: Reproducible Empirical
          Asset Pricing}},
  year = {2021},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2021-047},
  url = {}