Making Provenance Work for You

To be useful, scientific results must be reproducible and trustworthy. Data provenance—the history of data and how it was computed—underlies reproducibility of, and trust in, data analyses. Our work focuses on collecting data provenance from R scripts and providing tools that use the provenance to increase the reproducibility of and trust in analyses done in R. Specifically, our “End-to-end provenance tools” (“E2ETools”) use data provenance to: document the computing environment and inputs and outputs of a script’s execution; support script debugging and exploration; and explain differences in behavior across repeated executions of the same script. Use of these tools can help both the original author and later users of a script reproduce and trust its results.

Barbara Lerner (Mount Holyoke College) , Emery Boose (Harvard University) , Orenna Brand (Columbia University) , Aaron M. Ellison (Sound Solutions for Sustainable Science) , Elizabeth Fong (Mount Holyoke College) , Matthew Lau (University of Hawaii West Oahu) , Khanh Ngo (Mount Holyoke College) , Thomas Pasquier (University of British Columbia) , Luis A. Perez (Harvard College[^1]) , Margo Seltzer (University of British Columbia) , Rose Sheehan (Mount Holyoke College) , Joseph Wonsil (University of British Columbia)

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at



Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Lerner, et al., "The R Journal: Making Provenance Work for You", The R Journal, 2023

BibTeX citation

  author = {Lerner, Barbara and Boose, Emery and Brand, Orenna and Ellison, Aaron M. and Fong, Elizabeth and Lau, Matthew and Ngo, Khanh and Pasquier, Thomas and Perez, Luis A. and Seltzer, Margo and Sheehan, Rose and Wonsil, Joseph},
  title = {The R Journal: Making Provenance Work for You},
  journal = {The R Journal},
  year = {2023},
  note = {},
  doi = {10.32614/RJ-2023-003},
  volume = {14},
  issue = {4},
  issn = {2073-4859},
  pages = {141-159}