There is a strong software engineering culture in the R developer community. We recommend creating, updating and vetting packages as well as keeping up with community standards. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers.
The R programming language was originally created for statisticians, by statisticians, but evolved over time to attract a “massive pool of talent that was previously untapped” (Hadley Wickham in Thieme (2018)). Despite the fact that most R users are academic researchers and business data analysts without a background in software engineering, we are witnessing a rapid rise in software engineering within the community. In this comment we spotlight recent progress in tooling, dissemination and support, including specific efforts led by the rOpenSci project. We hope that readers will take advantage of and participate in the tools and practices we describe.
The basic infrastructure for creating, building, installing, and checking packages has been in place since the early days of the R language. During this time (1998-2011), the barriers to entry were very high and access to support and Q&A for beginners were extremely limited. With the introduction of the devtools (Wickham et al. 2021b) package in 2011, the process of creating and updating packages became substantially easier. Documentation also became simpler to maintain. The roxygen2 (Wickham et al. 2021a) package allowed developers to keep documentation in sync with changes in code, similar to the doxygen approach that was embraced in more mature languages. Combined with the rise in popularity of StackOverflow and the growth of rstats blogs, the number of packages on the Comprehensive R Archive Network (CRAN) skyrocketed from 400 new packages in 2010 to 1000 new packages by 2014. As of this writing, there are nearly 19k packages on CRAN.
For novices without substantial software engineer experience, the early testing frameworks were also difficult to use. With the release of testthat (Wickham 2011), testing also became smoother. There are now several actively maintained testing frameworks such as tinytest (van der Loo 2020); as well as testthat-compatible specialized tooling for testing database interactions (dittodb (Keane and Vargas 2020)), web resources (vcr (Chamberlain 2021)), httptest (Richardson 2021), and webfakes (Csárdi 2021) which enables the use of an embedded C/C++ web server for testing HTTP clients like httr2 (Wickham 2021)).
The testthat package has recently been improved with snapshot tests that make it possible to test plot outputs. The rOpenSci project has released autotest (Padgham 2021), a package that supports automatic mutation testing.
Beyond checking for compliance with R CMD CHECK, several other packages such as goodpractice (Csárdi and Frick 2018), riskmetric (R Validation Hub et al. 2021), rOpenSci’s pkgcheck (Padgham and Salmon 2021) check packages against a large list of actionable, community recommended best practices for software development. Collectively these tools allow domain researchers to release software packages that meet high standards for software engineering.
The development and testing ecosystem of R is rich and has sometimes borrowed successful implementations from other languages (e.g. the vcr R package is a port, i.e. translation to R, of the vcr Ruby gem; testthat snapshot tests were inspired by JS Jest1).
As underlined in Thieme (2018), community is the strong suit of the R language. Many organizations and venues offer dedicated support for package developers. Examples include Q&A on the r-package-devel mailing list2, and the package development category of the RStudio community forum3, and the rstats section of StackOverflow4. Traditionally, R package developers have been mostly male and white. Although the status quo remains similar, efforts from groups such as R-Ladies5 meetups, Minorities in R (Scott and Smalls-Perkins 2020), and the package development modules offered by Forwards for underrepresented groups6 have made considerable inroads towards improving diversity. These efforts have worked hard to put the spotlight on developers beyond the “usual suspects”.
The rOpenSci organization (Boettiger et al. 2015) is an attractive venue for developers & supporters of scientific R software. One of our most successful and continuing initiatives is our Software Peer Review system (Ram et al. 2019), a combination of academic peer-review and code review from industry.
About 150 packages have been reviewed by volunteers to date, creating better packages as well as a growing knowledgebase in our development guide (rOpenSci et al. 2021) while also building a living community of practice.
Our model has been the fundamental inspiration for projects such as the Journal of Open Source Software (Smith et al. 2018), and PyOpenSci [Wasser and Holdgraf (2019)](Trizna et al. 2021).
We are continuously improving our system and reducing cognitive overload on editors and reviewers by automating repetitive tasks. Most recently we have expanded our offerings to peer review of packages that implement statistical methods (Statistical Software Peer Review) (Padgham et al. 2021).
Beside software review, rOpenSci community is a safe, welcoming and informative place for package developers, with Q&A happening on our public forum and semi-open Slack workspace. (Butland and LaZerte 2020)
The aforementioned tools, venues and organizations benefit from and support crucial dissemination efforts.
Publishing technical know-how is crucial for progress of the R community. R news has been circulating on Twitter7, R Weekly8 and R-Bloggers9.
Some sources have been more specifically aimed at R package developers of various experience and interests.
While “Writing R Extensions” 10 is the official & exhaustive reference on writing R packages, it is a reference rather than a learning resource: many R package developers, if not learning by example, get introduced to R package development via introductory blog posts or tutorials, and the R packages book by Hadley Wickham and Jenny Bryan [Wickham (2015)](Wickham and Bryan) that accompany the devtools suite of packages is freely available online and strives to improving the R package development experience.
The rOpenSci guide “rOpenSci Packages: Development, Maintenance, and Peer Review” (rOpenSci et al. 2021) contains our community-contributed guidance on how to develop packages and review them.
It features opinionated requirements such as the use of roxygen2 (Wickham et al. 2021a) for package documentation; criteria helping make an informed decision on gray area topics such as limiting dependencies; advice on widely accepted and emerging tools.
As it is a living document also used as reference for editorial decisions, we maintain a changelog11, and summarize each release in a blog post12.
rOpenSci also hosts a book on a specialized topic, HTTP testing in R13, that presents both principles for testing packages that interact with web resources, as well as relevant packages.
Beside these examples of long-form documentation, knowledge around R software engineering is shared through blogs and talks.
In the R blogging world, the rOpenSci blog posts14, technical notes15 and a section of our monthly newsletter16 feature some topics relevant to package developers, as do some of the posts on the Tidyverse blog17.
The blog of the R-hub project18 contains information on package development topics, in particular about common problems such as sharing data via R packages or understanding CRAN checks.
Expert programmers have been sharing their R specific wisdom as well as software engineering lessons learned from other languages (e.g. Jenny Bryan’s useR! Keynote address “code feels, code smells”19).
In summary, we observe that there is already a strong software engineering culture in the R developer community. By surfacing the rich suite of resources to new developers we can but only hope the future will bring success to all aforementioned initiatives. We recommend creating, updating and vetting packages with the tools we mentioned as well as keeping up with community standards with the venues we mentioned in the previous section. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers. Thanks to these efforts, we hope the R community will continue to be a thriving place of application for software engineering, by diverse practitioners from many different paths.
devtools, roxygen2, testthat, tinytest, dittodb, vcr, httptest, webfakes, httr2, autotest, goodpractice, riskmetric, pkgcheck
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Salmon & Ram, "The R Developer Community Does Have a Strong Software Engineering Culture", The R Journal, 2021
BibTeX citation
@article{RJ-2021-110, author = {Salmon, Maëlle and Ram, Karthik}, title = {The R Developer Community Does Have a Strong Software Engineering Culture}, journal = {The R Journal}, year = {2021}, note = {https://doi.org/10.32614/RJ-2021-110}, doi = {10.32614/RJ-2021-110}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {18-21} }