To be useful, scientific results must be reproducible and trustworthy. Data provenance—the history of data and how it was computed—underlies reproducibility of, and trust in, data analyses. Our work focuses on collecting data provenance from R scripts and providing tools that use the provenance to increase the reproducibility of and trust in analyses done in R. Specifically, our “End-to-end provenance tools” (“E2ETools”) use data provenance to: document the computing environment and inputs and outputs of a script’s execution; support script debugging and exploration; and explain differences in behavior across repeated executions of the same script. Use of these tools can help both the original author and later users of a script reproduce and trust its results.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-003.zip
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lerner, et al., "The R Journal: Making Provenance Work for You", The R Journal, 2023
BibTeX citation
@article{RJ-2023-003, author = {Lerner, Barbara and Boose, Emery and Brand, Orenna and Ellison, Aaron M. and Fong, Elizabeth and Lau, Matthew and Ngo, Khanh and Pasquier, Thomas and Perez, Luis A. and Seltzer, Margo and Sheehan, Rose and Wonsil, Joseph}, title = {The R Journal: Making Provenance Work for You}, journal = {The R Journal}, year = {2023}, note = {https://doi.org/10.32614/RJ-2023-003}, doi = {10.32614/RJ-2023-003}, volume = {14}, issue = {4}, issn = {2073-4859}, pages = {141-159} }