Rejoinder: Software Engineering and R Programming

It is a pleasure to take part in such fruitful discussion about the relationship between Software Engineering and R programming, and what could be gain by allowing each to look more closely at the other. Several discussants make valuable arguments that ought to be further discussed.

Melina Vidoni (Australian National University, School of Computing)
2021-12-14

The Roles

It is worth arguing about the difference between research software engineers and software engineering researchers. While the former can be anyone developing scientific software for computation/data sciences (regardless of their technical background or "home" discipline), the latter are academics investigating software engineering in different domains.

Software engineering researchers aim to produce research that is translatable and usable by practitioners, and when investigating R programming (or any other type of scientific software) the "practitioners" are research software engineers. This distinction is relevant as one cannot work without the other. In other words, software engineering researchers ought to study research software engineers such like they study, e.g., a web developer, with the goal of uncovering their "pain points" and propose a solution to it. Likewise, research software engineers depend on software engineering researchers and expect them to produce the new knowledge they need.

However, what a research software engineer will vary by the programming language they use, and what they aim to achieve with it. In terms of R programming, as one discussant pointed, there can be a difference between an "R user" (which uses R to perform data analysis) and an "R developer" (which besides using the language, also develops it by creating publicly shared packages). However, to this extent, research has used both terms interchangeably, which leads to a possible avenue of work in terms of "human aspects of R programming".

The Software

This is where the next link appears–the tools and packages mentioned in the commentaries were developed with the intention of translating/migrating knowledge acquired/produced by software engineering researchers to the domain of R programming, and to be used by research software engineers. For example, the package covr streamlines the process of calculating the unit testing coverage of a package, and the original papers presenting such measures can be tracked down to the late "80s (DeMillo 1987; Frankl and Weyuker 1988). Albeit it is known coverage as a measure evolved and changed over time (and continues to do so), it is an excellent example of the outcome produced by software engineering researchers that successfully translated their findings to "practitioners" (in this case, research software engineers).

Therefore, a package is part of the "translation" of the knowledge acquired through software engineering research, into an accessible, usable framework. However, the tool itself is not enough–without the "environment" changing, growing, and learning, the tool may not be used to its full potential. Note that "environment" is used to refer (widely and loosely) to a person’s programming habits, acceptance to change, past experiences (e.g., time/effort spent in solving a bug, or domains worked on), and even the people around them (e.g., doing/not doing something because of what others do/do not do) that influence their vision, attitude and expectations regarding programming.

Moreover tools and packages are not finite, static elements–because they are software, they evolve. And when the requirements of a community change, so must do so the tools. This act as a reminder to not assign a "silver bullet" status to a tool meant to solve a particular, static problem, when it has been known that software (and thus the practices to develop it) evolve, and may even become unmanageable, never to be fully solved (Brooks 1987).

The Goal

Another related aspect is that "scientific software" has broader, different goals than "traditional" (namely, non-scientific) software development–it has been argued that "scientific software development" is concerned with knowledge acquisition rather than software production (Kelly 2015); e.g. a "tool" can be an RMarkdown document that allows performing an analysis (hence, using the language). Related to this, "scientific software" uses diverse paradigms, such as literate programming (which has been considered a programming paradigm for a few decades (Cordes and Brown 1991)) and scripting (which in turn, continues to elicit mixed stances from software engineering researchers (Loui 2008)) with goals different to "traditional software".

Thus, what "software engineering practices" mean for "scientific software" remains ambiguous, and some authors have argued that the "gap" between software engineering and scientific programming threatens the production of reliable scientific results (Storer 2017). The following are some example questions meant to illustrate how these other aspects of "scientific software" may still be related to software engineering practices:

Could text in a literate programming file be considered documentation? Is scripting subjected to code-smell practices like incorrect naming or code reuse? Does self-admitted technical debt exists in literate/scripting programming? What is the usability of a literate programming document? Should analytical scripts be meant for reuse?

The original article was intended to highlight some of the efforts made by software engineering researchers to bridge this gap of software engineering knowledge for "scientific programming". Nonetheless, software engineering researchers have perhaps focused more strongly on R packages because of their similarities to their current research (namely, "traditional software" development), thus making the translation of knowledge slightly more straightforward. Approaching other aspects, paradigms, tools and process of "scientific software" development still remains a gap on research that should be further studied.

The Community

The community is the next link in this chain–they motivate software engineering researchers" investigations, are the subjects, and the beneficiaries. Yet many times, they can also be the cause of their own "pain points". For example, research has shown that although StackOverflow is nowadays a staple for any programmer, many solutions derived from it can be outright insecure (Acar et al. 2016; Fischer et al. 2017; Rahman et al. 2019), have poor quality and code smells (Zhang et al. 2018; Meldrum et al. 2020), be outdated (Zhang 2020; Zerouali et al. 2021), or have low performance (Toro 2021), among others. This is but a facet of the concept of "there is no silver bullet" (Brooks 1987), and the only way of solving such situation (partially, and temporarily) is to look at it from multiple points of views. This action is what the original paper aimed to highlight.

Final words

In the end, the differences between software engineering researchers and research software engineers are blurry, and the translation of concepts from "traditional software" development/research to "scientific software" development/research may not be as straightforward as both groups of stakeholders consider. However, for the R community to continue evolving, both can (and should) work together and learn from the other.

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Y. Acar, M. Backes, S. Fahl, D. Kim, M. L. Mazurek and C. Stransky. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE symposium on security and privacy (SP), pages. 289–305 2016. DOI 10.1109/SP.2016.25.
F. Brooks Jr. No silver bullet essence and accidents of software engineering. IEEE Computer, 20: 10–19, 1987. DOI 10.1109/MC.1987.1663532.
D. Cordes and M. Brown. The literate-programming paradigm. Computer, 24(6): 52–61, 1991. DOI 10.1109/2.86838.
R. A. DeMillo. Software testing and evaluation. Menlo Park, Calif Benjamin/Cummings Pub. Co, 1987.
F. Fischer, K. Böttinger, H. Xiao, C. Stransky, Y. Acar, M. Backes and S. Fahl. Stack overflow considered harmful? The impact of copy amp;paste on android application security. In 2017 IEEE symposium on security and privacy (SP), pages. 121–136 2017. DOI 10.1109/SP.2017.31.
P. G. Frankl and E. J. Weyuker. An applicable family of data flow testing criteria. IEEE Transactions on Software Engineering, 14(10): 1483–1498, 1988. DOI 10.1109/32.6194.
D. Kelly. Scientific software development viewed as knowledge acquisition. Journal of Systems and Software, 109(C): 50–61, 2015. URL https://doi.org/10.1016/j.jss.2015.07.027.
R. P. Loui. In praise of scripting: Real programming pragmatism. Computer, 41(7): 22–26, 2008. DOI 10.1109/MC.2008.228.
S. Meldrum, S. A. Licorish, C. A. Owen and B. T. R. Savarimuthu. Understanding stack overflow code quality: A recommendation of caution. Science of Computer Programming, 199: 102516, 2020. URL https://www.sciencedirect.com/science/article/pii/S0167642320301246.
A. Rahman, E. Farhana and N. Imtiaz. Snakes in paradise?: Insecure python-related coding practices in stack overflow. In 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pages. 200–204 2019. DOI 10.1109/MSR.2019.00040.
T. Storer. Bridging the chasm: A survey of software engineering practice in scientific programming. ACM Comput. Surv., 50(4): 2017. DOI 10.1145/3084225.
M. L. Toro. Understanding the consistency of stack overflow code: A cautionary suggestion. LC International Journal of STEM (ISSN: 2708-7123), 2(1): 40–47, 2021.
A. Zerouali, C. Velázquez-Rodríguez and C. De Roover. Identifying versions of libraries used in stack overflow code snippets. In 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pages. 341–345 2021. DOI 10.1109/MSR52588.2021.00046.
H. Zhang. On the maintenance of crowdsourced knowledge on stack overflow. 2020.
T. Zhang, G. Upadhyaya, A. Reinhardt, H. Rajan and M. Kim. Are code examples on an online q amp;a forum reliable?: A study of API misuse on stack overflow. In 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pages. 886–896 2018. DOI 10.1145/3180155.3180260.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Vidoni, "Rejoinder: Software Engineering and R Programming", The R Journal, 2021

BibTeX citation

@article{RJ-2021-112,
  author = {Vidoni, Melina},
  title = {Rejoinder: Software Engineering and R Programming},
  journal = {The R Journal},
  year = {2021},
  note = {https://rjournal.github.io/},
  volume = {13},
  issue = {2},
  issn = {2073-4859},
  pages = {25-27}
}