Software Engineering and R Programming: A Call for Research

Although R programming has been a part of research since its origins in the 1990s, few studies address scientific software development from a Software Engineering (SE) perspective. The past few years have seen unparalleled growth in the R community, and it is time to push the boundaries of SE research and R programming forwards. This paper discusses relevant studies that close this gap Additionally, it proposes a set of good practices derived from those findings aiming to act as a call-to-arms for both the R and RSE (Research SE) community to explore specific, interdisciplinary paths of research.

Melina Vidoni (Australian National University, School of Computing)
2021-12-14

1 Introduction

R is a multi-paradigm statistical programming language, based on the S statistical language (Morandat et al. 2012), developed in the 1990s by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. It is maintained by the R Development Core Team (Thieme 2018). Though CRAN (Comprehensive Archive Network) was created for users to suggest improvements and report bugs, nowadays, is the official venue to submit user-generated R packages (Ihaka 2017). R has gained popularity for work related to statistical analysis and mathematical modelling and has been one of the fastest-growing programming languages (Muenchen 2017). In July 2020, R ranked 8th in the TIOBE index, which measures of popularity of programming languages; as a comparison, one year before (July 2019), TIOBE ranked R in the 20th position (TIOBE 2020). According to (Korkmaz et al. 2018), “this has led to the development and distribution of over 10,000 packages, each with a specific purpose". Furthermore, it has a vibrant end-user programming community, where the majority of contributors and core members are ”not software engineers by trade, but statisticians and scientists", with diverse technical backgrounds and application areas (German et al. 2013).

R programming has become an essential part of computational science–the "application of computer science and Software Engineering (SE) principles to solving scientific problems" (Hasselbring et al. 2019). As a result, there are numerous papers discussing R packages explicitly developed to close a particular gap or assist in the analysis of data of a myriad of disciplines. Regardless of the language used, the development of software to assist in research ventures in a myriad of disciplines, has been termed as ‘research SE’ (RSE) (Cohen et al. 2021). Overall, RSE has several differences with traditional software development, such as the lifecycles used, the software goals and life-expectancy, and the requirements elicitation. This type of software is often “constructed for a particular project, and rarely maintained beyond this, leading to rapid decay, and frequent ’reinvention of the wheel" (Rosado de Souza et al. 2019).

However, both RSE and SE for R programming remain under-explored, with little SE-specific knowledge being tailored to these two areas. This poses several problems, given that in computational science, research software is a central asset for research. Moreover, although most RSE-ers (the academics writing software for research) come from the research community, a small number arrive from a professional programming background (Pinto et al. 2018; Cohen et al. 2021). Previous research showed R programmers do not consider themselves as developers (Pinto et al. 2018) and that few of them are aware of the intricacies of the language (Morandat et al. 2012). This poses a problem since the lack of formal programming training can lead to lower quality software (Hasselbring et al. 2019), as well as less-robust software (Vidoni 2021a). This is problematic since ensuring sustainable development focused on code quality and maintenance is essential for the evolution of research in a myriad of computational science disciplines, as faulty and low-quality software can potentially affect research results (Cohen et al. 2018).

As a result, this paper aims to provide insights into three core areas:

The rest of this paper is organised as follows. Section 2 presents the related works, introducing them one by one. Section 3 outlines the proposed best practices, and Section 4 concludes this work with a call-to-action for the community.

2 Related Works

This Section discusses relevant works organised in four sub-areas related to software development: coding in R, testing packages, reviewing them, and developers’ experiences.

Area: Coding in R

Code quality is often related to technical debt. Technical Debt (TD) is a metaphor used to reflect the implied cost of additional rework caused by choosing an easy solution in the present, instead of using a better approach that would take longer (Samarthyam et al. 2017).

Claes et al. (2015) mined software repositories (MSR) to evaluate the maintainability of R packages published in CRAN. They focused on function clones, which is the practice of duplicating functions from other packages to reduce the number of dependencies; this is often done by copying the code of an external function directly into the package under development or by re-exporting the function under an alias. Code clones are harmful because they lead to redundancy due to code duplication and are a code smell (i.e., a practice that reduces code quality, making maintenance more difficult).

The authors identified that cloning, in CRAN packages only, is often caused by several reasons. These are: coexisting package versions (with some packages lines being cloned in the order of the hundreds and thousands), forked packages, packages that are cloned more than others, utility packages (i.e., those that bundle functions from other packages to simply importing), popular packages (with functions cloned more often than in other packages), and popular functions (specific functions being cloned by a large number of packages).

Moreover, they analysed the cloning trend for packages published in CRAN. They determined that the ratio of packages impacted by cloning appears to be stable but, overall, it represents over quarter-million code lines in CRAN. Quoting the authors, “those lines are included in packages representing around 50% of all code lines in CRAN.” (Claes et al. 2015). Related to this, Korkmaz et al. (2019) found that the more dependencies a package has, the less likely it is to have a higher impact. Likewise, other studies have demonstrated that scantily updated packages that depend on others that are frequently updated are prone to have more errors caused by incompatible dependencies (Plakidas et al. 2017); thus, leading developers to clone functions rather than importing.

Code quality is also reflected by the comments developers write in their code. The notion of Self-Admitted Technical Debt (SATD) indicates the case where programmers are aware that the current implementation is not optimal and write comments alerting of the problems of the solution (Potdar and Shihab 2014). (Vidoni 2021b) conducted a three-part mixed-methods study to understand SATD in R programming, mining over 500 packages publicly available in GitHub and enquiring their developers through an anonymous online survey. Overall, this study uncovered that:

This work extended previous findings obtained exclusively for OO, identifying specific debt instances as developers perceive them. However, a limitation of the findings is that the dataset was manually generated. For the moment, there is no tool or package providing support to detect SATD comments in R programming automatically.

Area: Testing R Packages

Vidoni (2021a) conducted a mixed-methods MSR (Mining Software Repositories) that combined mining GitHub repositories with a developers survey to study testing technical debt (TTD) in R programming–the test dimension of TD.

Overall, this study determined that R packages testing has poor quality, specifically caused by the situations summarised in Table 1. A finding is with regards to the type of tests being carried out. When designing test cases, good practices indicate that developers should test common cases (the "traditional" or "more used" path of an algorithm or function) as well as edge cases (values that require special handling, hence assessing boundary conditions of an algorithm or function) (Daka and Fraser 2014). Nonetheless, this study found that almost 4/5 of the tests are common cases, and a vast majority of alternative paths (e.g., accessible after a condition) are not being assessed.

Moreover, this study also determined that available tools for testing are limited regarding their documentation and the examples provided (as indicated by survey respondents). This includes the usability of the provided assertions (given that most developers use custom-defined cases) and the lack of tools to automate the initialisation of data for testing, which often causes the test suits to fail due to problems in the suite itself.

Table 1: Problems found by (Vidoni 2021a) regarding unit testing of R packages.
Smell Definition (Samarthyam et al. 2017) Reason (Vidoni 2021a)
Inadequate Unit Tests The test suite is not ideal to ensure quality testing. Many relevant lines remain untested. Alternative paths (i.e., those accessible after a condition) are mostly untested. There is a large variability in the coverage of packages from the same area (e.g., biostatistics). Developers focus on common cases only, leading to incomplete testing.
Obscure Unit Tests When unit tests are obscure, it becomes difficult to understand the unit test code and the production code for which the tests are written. Multiple asserts have unclear messages. Multiple asserts are mixed in the same test function. Excessive use of user-defined asserts instead of relying on the available tools.
Improper Asserts Wrong or non-optimal usage of asserts leads to poor testing and debugging. Testing concentrated on common cases. Excessive use of custom asserts. Developers still uncover bugs in their code even when the tests are passing.
Inexperienced Testers Testers, and their domain knowledge, are the main strength of exploratory testing. Therefore, low tester fitness and non-uniform test accuracy over the whole system accumulate residual defects. Survey participants are reportedly highly-experienced, yet their most common challenge was lack of testing knowledge and poor documentation of tools.
Limited Test Execution Executing or running only a subset of tests to reduce the time required. It is a shortcut increasing the possibility of residual defects. A large number of mined packages (about 35%) only used manual testing, with no automated suite (e.g., testthat). The survey responses confirmed this proportion.
Improper Test Design Since the execution of all combination of test cases is an effort-intensive process, testers often run only known, less problematic tests (i.e., those less prone to make the system fail). This increases the risk of residual defects. The study found a lack of support for automatically testing plots. The mined packages used testthat functions to generate a plot that was later (manually) inspected by a human to evaluate readability, suitability, and other subjective values. Survey results confirmed developers struggle with plots assessment.

Křikava and Vitek (2018) conducted an MSR to inspect R packages’ source code, making available a tool that automatically generates unit tests. In particular, they identified several challenges regarding testing caused by the language itself, namely its extreme dynamism, coerciveness, and lack of types, which difficult the efficacy of traditional test extraction techniques.

In particular, the authors worked with execution traces, “the sequence of operations performed by a program for a given set of input values” (Křikava and Vitek 2018), to provide genthat, a package to optimise the unit testing of a target package (Krikava 2018). genthat records the execution traces of a target package, allowing the extraction of unit test functions; however, this is limited to the public interface or the internal implementation of the target package. Overall, its process requires installation, extraction, tracing, checking and minimisation.

Both genthat and the study performed by these authors are highly valuable to the community since the minimisation phase of the package checks the unit tests and discards those failing, and records to coverage, eliminating redundant test cases. Albeit this is not a solution to the lack of edge cases detected in another study (Vidoni 2021a), this genthat assists developers and can potentially reduce the workload required to obtain a baseline test suite. However, this work’s main limitation is its emphasis on the coverage measure, which is not an accurate reflection of the tests’ quality.
Finally, Russell et al. (2019) focused on the maintainability quality of R packages caused by their testing and performance. The authors conducted an MSR of 13500 CRAN packages, demonstrating that "reproducible and replicable software tests are frequently not available". This is also aligned with the findings of other authors mentioned in this Section. They concluded with recommendations to improve the long-term maintenance of a package in terms of testing and optimisation, reviewed in Section 3.

Area: Reviewing Packages

The increased relevance of software in data science, statistics and research increased the need for reproducible, quality-coded software (Howison and Herbsleb 2011). Several community-led organisations were created to organize and review packages - among them, rOpenSci (Ram et al. 2019; rOpenSci et al. 2021) and BioConductor (Gentleman et al. 2004). In particular, rOpenSci has established a thorough peer-review process for R packages based on the intersection of academic peer-reviews and software reviews.

As a result, Codabux et al. (2021) studied rOpenSci open peer-review process. They extracted completed and accepted packages reviews, broke down individual comments, and performed a card sorting approach to determine which types of TD were most commonly discussed.

One of their main contributions is a taxonomy of TD extending the current definitions to R programming. It also groups debt types by perspective, representing ’who is the most affected by a type of debt". They also provided examples of rOpenSci’s peer-review comments referring to a specific debt. This taxonomy is summarised in Table 2, also including recapped definitions.

Table 2: Taxonomy of TD types and perspectives for R packages, proposed by Codabux et al. (2021).
Perspective TD Type Reason
User Usability In the context of R, test debt encompasses anything related to usability, interfaces, visualisation and so on.
Documentation For R, this is anything related to roxygen2 (or alternatives such as the Latex or Markdown generation), readme files, vignettes and even pkgdown websites.
Requirements Refers to trade-offs made concerning what requirements the development team needs to implement or how to implement them.
Developer Test In the context of R, test debt encompasses anything related to coverage, unit testing, and test automation.
Defect Refers to known defects, usually identified by testing activities or by the user and reported on bug tracking systems.
Design For R, this debt is related to any OO feature, including visibility, internal functions, the triple-colon operator, placement of functions in files and folders, use of roxygen2 for imports, returns of objects, and so on.
Code In the context of R, examples of code debt are anything related to renaming classes and functions, \(<-\) vs. \(=\), parameters and arguments in functions, FALSE/TRUE vs. F/T, print vs warning/message.
CRAN Build In the context of R, examples of build debt are anything related to Travis, Codecov.io, GitHub Actions, CI, AppVeyor, CRAN, CMD checks, devtools::check.
Versioning Refers to problems in source code versioning, such as unnecessary code forks.
Architecture for example, violation of modularity, which can affect architectural requirements (e.g., performance, robustness).

Additionally, they uncovered that almost one-third of the debt discussed is documentation debt–related to how well packages are being documented. This was followed by code debt, providing a different distribution than the one obtained by (Vidoni 2021b). This difference is caused by rOpenSci reviewers focusing on documentation (e.g., comments written by reviewers’ account for most of the documentation debt), while developers’ comments concentrate their attention in code debt. The entire classification process is detailed in the original study Codabux et al. (2021).

Area: Developers’ Experiences

Developers’ perspectives on their work are fundamental to understand how they develop software. However, scientific software developers have a different point of view than ‘traditional’ programmers (Howison and Herbsleb 2011).

Pinto et al. (2018) used an online questionnaire to survey over 1500 R developers, with results enriched with metadata extracted from GitHub profiles (provided by the respondents in their answers). Overall, they found that scientific developers are primarily self-taught but still consider peer-learning a second valuable source. Interestingly, the participants did not perceive themselves as programmers, but rather as a member of any other discipline. This also aligns with findings provided by other works (Morandat et al. 2012; German et al. 2013). Though the latter is understandable, such perception may pose a risk to the development of quality software as developers may be inclined to feel ‘justified’ not to follow good coding practices (Pinto et al. 2018).

Additionally, this study found that scientific developers work alone or in small teams (up to five people). Interestingly enough, they found that people spend a significant amount of time focused on coding and testing and performed an ad-hoc elicitation of requirements, mostly ‘deciding by themselves’ on what to work next, rather than following any development lifecycle.

When enquiring about commonly-faced challenges, the participants of this study considered the following: cross-platform compatibility, poor documentation (which is a central topic for reviewers (Codabux et al. 2021)), interruptions while coding, lack of time (also mentioned by developers in another study (Vidoni 2021b)), scope bloat, lack of user feedback (also related to validation, instead of verification testing), and lack of formal reward system (e.g., the work is not credited in the scientific community (Howison and Herbsleb 2011)).

Table 3: Recommendations of best practices, according to the issues found in previous work and good practices established in the SE community.
Area Main Problem Recommended Practice
Lifecycles The lack of proper requirement elicitation and development organisation was identified as a critical problem for developers (Pinto et al. 2018; Wiese et al. 2020), who often resort to writing comments in the source to remind themselves of tasks they later do not address (Vidoni 2021b). There are extremely lightweight agile lifecycles (e.g., Extreme Programming, Crystal Clear, Kanban) that can be adapted for a single developer or small groups. Using these can provide a project management framework that can also organise a research project that depends on creating scientific software.
Teaching Most scientific developers do not perceive themselves as programmers and are self-taught (Pinto et al. 2018). This hinders their background knowledge and the tools they have available to detect TD and other problems, potentially leading to low-quality code (German et al. 2013). Since graduate school is considered fundamental for these developers (Pinto et al. 2018), providing a solid foundation of SE-oriented R programming for candidates whose research relies heavily on software can prove beneficial. The topics to be taught should be carefully selected to keep them practical and relevant yet still valuable for the candidates.
Coding Some problems discussed where functions clones, incorrect imports, non-semantic or meaningful names, improper visibilitiy or file distribution of functions, among others. Avoid duplicating (i.e., copy-pasting or re-exporting) functions from other packages, and instead use proper selective import, such as roxygen2‘s @importFrom or similar Latex documentation styles. Avoid leaving unused functions or pieces of code that are ’commented out’ to be nullified. Proper use of version control enables developers to remove the segments of code and revisit them through previous commits. Code comments are meant to be meaningful and should not be used as a planning tool. Comments indicating problems or errors should be addressed (either when found, if the problem is small or planning for a specific time to do it if the problem is significant). Names should be semantic and meaningful, maintaining consistency in the whole project. Though there is no pre-established convention for R, previous works provide an overview (Baath 2012), as well as packages, such as the tidyverse’s style guide.
Testing Current tests leave many relevant paths unexplored, often ignoring the testing of edge cases and damaging the robustness of the code packaged (Russell et al. 2019; Vidoni 2021a) All alternative paths should be tested (e.g., those limited by conditionals). Exceptional cases should be tested; e.g., evaluating that a function throws an exception or error when it should, and evaluating other cases such as (but not limited to), nulls, NAs, NaNs, warnings, large numbers, empty strings, empty variables (e.g., character(0), among others.
Other specific testing cases, including performance evaluation and profiling, discussed and exemplified by Russell et al. (2019).

This study (Pinto et al. 2018) was followed up to create a taxonomy of problems commonly faced by scientific developers (Wiese et al. 2020). They worked with over 2100 qualitatively-reported problems and grouped them into three axes; given the size of their taxonomy, only the larger groups are summarised below:

These two works provide valuable insight into scientific software developers. Like other works mentioned in this article, albeit there are similarities with traditional software development (both in terms of programming paradigms and goals), the differences are notable enough to warrant further specialised investigations.

3 Towards Best Practices

Based on well-known practices for traditional software development (Sommerville 2015), this Section outlines a proposal of best practices for R developers. These are meant to target the weaknesses found by the previous studies discussed in Section 2. This list aims to provide a baseline, aiming that (through future research works) they can be improved and further tailored to the needs of scientific software development and the R community in itself.

The practices discussed span from overarching (e.g., related to processes) to specific activities. They are summarised in Table 3.

4 Call to Action

Scientific software and R programming became ubiquitous to numerous disciplines, providing essential analysis tools that could not be completed otherwise. Albeit R developers are reportedly struggling in several areas, academic literature centred on the development of scientific software is scarce. As a result, this Section provides two calls to actions: one for R users and another for RSE academics.

Research Software Engineering Call: SE for data science and scientific software development is crucial for advancing research outcomes. As a result, interdisciplinary works are increasingly needed to approach specific areas. Some suggested topics to kickstart this research are as follows:

R Community Call: The following suggestions are centred on the abilities of the R community:

There is a wide range of possibilities and areas to work, all derived from diversifying R programming and RSE. This paper highlighted meaningful work in this area and proposed a call-to-action to further this area of research and work. However, these ideas need to be repeatedly evaluated and refined to be valuable to R users.

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The author is grateful to both R-Ladies and rOpenSci communities that fostered the interest in this topic and to Prof. Dianne Cook for extending the invitation for this article.

5 Packages Mentioned

The following packages were mentioned in this article:

CRAN packages used

genthat, roxygen2, pkgdown, covr, testthat, tidyverse

CRAN Task Views implied by cited packages

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

N. S. R. Alves, T. S. Mendes, M. G. de Mendonça, R. O. Spínola, F. Shull and C. Seaman. Identification and management of technical debt: A systematic mapping study. Information and Software Technology, 70: 100–121, 2016. DOI https://doi.org/10.1016/j.infsof.2015.10.008.
R. Baath. The state of naming conventions in r. The R Journal, 4: 74–75, 2012. DOI 10.32614/RJ-2012-018.
M. Claes, T. Mens, N. Tabout and P. Grosjean. An empirical study of identical function clones in CRAN. In 2015 IEEE 9th International Workshop on Software Clones (IWSC), pages. 19–25 2015. DOI 10.1109/IWSC.2015.7069885.
Z. Codabux, M. Vidoni and F. Fard. Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study. In 2021 International Conference on Mining Software Repositories, pages. 1–11 2021. Madrid, Spain: IEEE. DOI https://arxiv.org/abs/2103.09340.
J. Cohen, D. S. Katz, M. Barker, N. Chue Hong, R. Haines and C. Jay. The four pillars of research software engineering. IEEE Software, 38(1): 97–105, 2021. DOI 10.1109/MS.2020.2973362.
J. Cohen, D. S. Katz, M. Barker, R. Haines and N. Chue Hong. Building a sustainable structure for research software engineering activities. In 2018 IEEE 14th international conference on e-science (e-science), pages. 31–32 2018. DOI 10.1109/eScience.2018.00015.
E. Daka and G. Fraser. A survey on unit testing practices and problems. In 2014 IEEE 25th international symposium on software reliability engineering, pages. 201–211 2014.
R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5(10): R80, 2004. URL https://doi.org/10.1186/gb-2004-5-10-r80.
D. M. German, B. Adams and A. E. Hassan. The Evolution of the R Software Ecosystem. In 2013 17th European Conference on Software Maintenance and Reengineering, pages. 243–252 2013. DOI 10.1109/CSMR.2013.33. ISSN: 1534-5351.
W. Hasselbring, L. Carr, S. Hettrick, H. Packer and T. Tiropanis. FAIR and open computer science research software. 2019. URL https://arxiv.org/abs/1908.05986.
J. Howison and J. D. Herbsleb. Scientific software production: Incentives and collaboration. In Proceedings of the ACM 2011 conference on computer supported cooperative work, pages. 513–522 2011. New York, NY, USA: Association for Computing Machinery. ISBN 9781450305563. URL https://doi.org/10.1145/1958824.1958904.
R. Ihaka. The r project: A brief history and thoughts about the future. 2017. URL https://www.stat.auckland.ac.nz/ \~{}ihaka/downloads/Massey.pdf.
G. Korkmaz, C. Kelling, C. Robbins and S. Keller. Modeling the impact of Python and R packages using dependency and contributor networks. Social Network Analysis and Mining, 10(1): 7, 2019. URL https://doi.org/10.1007/s13278-019-0619-1 [online; last accessed July 14, 2020].
G. Korkmaz, C. Kelling, C. Robbins and S. A. Keller. Modeling the Impact of R Packages Using Dependency and Contributor Networks. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages. 511–514 2018. DOI 10.1109/ASONAM.2018.8508255. ISSN: 2473-991X.
F. Krikava. fikovnik/ISSTA18-artifact: ISSTA’18 Artifact release. 2018. URL https://doi.org/10.5281/zenodo.1306437.
F. Křikava and J. Vitek. Tests from traces: Automated unit test extraction for r. In Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pages. 232–241 2018. New York, NY, USA: Association for Computing Machinery. ISBN 9781450356992. URL https://doi.org/10.1145/3213846.3213863.
J. Liu, Q. Huang, X. Xia, E. Shihab, D. Lo and S. Li. Is Using Deep Learning Frameworks Free? Characterizing Technical Debt in Deep Learning Frameworks. In Proceedings of the ACM/IEEE 42nd international conference on software engineering: Software engineering in society, pages. 1–10 2020. New York, NY, USA: Association for Computing Machinery. ISBN 9781450371244. URL https://doi.org/10.1145/3377815.3381377.
F. Morandat, B. Hill, L. Osvald and J. Vitek. Evaluating the Design of the R Language. In ECOOP 2012 – Object-Oriented Programming, Ed J. Noble pages. 104–131 2012. Berlin, Heidelberg: Springer. ISBN 978-3-642-31057-7. DOI 10.1007/978-3-642-31057-7\_6.
B. Muenchen. R’s growth continues to accelerate. 2017. URL https://www.r-bloggers.com/rs-growth-continues-to-accelerate/.
G. Pinto, I. Wiese and L. F. Dias. How do scientists develop scientific software? An external replication. In 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pages. 582–591 2018. DOI 10.1109/SANER.2018.8330263.
K. Plakidas, D. Schall and U. Zdun. Evolution of the r software ecosystem: Metrics, relationships, and their impact on qualities. Journal of Systems and Software, 132: 119–146, 2017. URL https://www.sciencedirect.com/science/article/pii/S0164121217301371.
A. Potdar and E. Shihab. An Exploratory Study on Self-Admitted Technical Debt. In 2014 IEEE international conference on software maintenance and evolution, pages. 91–100 2014. DOI 10.1109/ICSME.2014.31.
K. Ram, C. Boettiger, S. Chamberlain, N. Ross, M. Salmon and S. Butland. A community of practice around peer review for long-term research software sustainability. Computing in Science Engineering, 21(2): 59–65, 2019. DOI 10.1109/MCSE.2018.2882753.
rOpenSci, B. Anderson, S. Chamberlain, L. DeCicco, J. Gustavsen, A. Krystalli, M. Lepore, L. Mullen, K. Ram, N. Ross, et al. rOpenSci Packages: Development, Maintenance, and Peer Review. 2021. URL https://doi.org/10.5281/zenodo.4554776.
M. Rosado de Souza, R. Haines, M. Vigo and C. Jay. What makes research software sustainable? An interview study with research software engineers. In 2019 IEEE/ACM 12th international workshop on cooperative and human aspects of software engineering (CHASE), pages. 135–138 2019. DOI 10.1109/CHASE.2019.00039.
S. Russell, T. D. Bennett and D. Ghosh. Software engineering principles to improve quality and performance of R software. PeerJ Computer Science, 5: e175, 2019. DOI 10.7717/peerj-cs.175. Publisher: PeerJ Inc.
G. Samarthyam, M. Muralidharan and R. K. Anna. Understanding Test Debt. In Trends in software testing, pages. 1–17 2017. Singapore: Springer Singapore. ISBN 978-981-10-1415-4. URL https://doi.org/10.1007/978-981-10-1415-4\_1.
I. Sommerville. Software engineering. 10th ed Pearson, 2015.
N. Thieme. R generation. Significance, 15(4): 14–19, 2018. DOI 10.1111/j.1740-9713.2018.01169.x.
TIOBE. TIOBE index - the software quality company. 2020. URL https://www.tiobe.com/tiobe-index/.
M. Vidoni. Evaluating unit testing practices in r packages. In Proceedings of the 43rd international conference on software engineering (ICSE), pages. 1–12 2021a. Madrid, Spain: IEEE.
M. Vidoni. Self-Admitted Technical Debt in R Packages: An Exploratory Study. In 2021 International Conference on Mining Software Repositories, pages. 1–11 2021b. Madrid, Spain: IEEE.
I. Wiese, I. Polato and G. Pinto. Naming the pain in developing scientific software. IEEE Software, 37(4): 75–82, 2020. DOI 10.1109/MS.2019.2899838.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Vidoni, "Software Engineering and R Programming: A Call for Research", The R Journal, 2021

BibTeX citation

@article{RJ-2021-108,
  author = {Vidoni, Melina},
  title = {Software Engineering and R Programming: A Call for Research},
  journal = {The R Journal},
  year = {2021},
  note = {https://rjournal.github.io/},
  volume = {13},
  issue = {2},
  issn = {2073-4859},
  pages = {6-14}
}