We give a selection of the most important changes in R 4.0.0 and in the R 3.6 release series. Some statistics on source code commits and bug tracking activities are also provided.
R 4.0.0 (codename “Arbor Day”) was released on 2020-04-24. The following gives a selection of the most important changes.
matrix
objects now also inherit from class "array"
, so e.g.,
class(diag(1))
is c("matrix", "array")
. S3 methods for class
"array"
are now dispatched for matrix
objects. This reduces the
need of code duplication between "array"
and "matrix"
classes,
but invalidates code incorrectly assuming that class(matrix_obj))
has length one. In principle, to check whether an object inherits
from (any) class, one should always use inherits()
(or is()
).
See Martin Maechler’s blog
post
for more details.r"(...)"
with ...
any character
sequence not containing the sequence )"
. This makes it easier to
write strings that contain backslashes and/or both single and double
quotes:
r"(c:\Program files\R)"
specifies a Windows directory without
escaping backslashes. r"(use both "double" and 'single' quotes)"
mixes single and double quotes without the need to escape either of
them. For more details see ?Quotes
.stringsAsFactors = FALSE
default, and hence by
default no longer converts strings to factors in calls to
data.frame()
and read.table()
. Automatic conversion of strings
to factors regardless of the context of the study at hand seems
conceptually wrong. In addition, when the automatically applied
order is lexicographical order, the result is locale dependent and
even so when only ASCII characters are used. Historically, automatic
conversions to factors could have been disabled on demand, but
unfortunately that meant that all code dealing with data frames
would have to support both ways. That was not the case, leading to
surprising or unpredictable results. A large number of packages
relied on the previous behavior and so have needed updating. Unlike
in the case of matrices being treated as arrays, this was a change
to documented behavior, so even correct package code was affected.
See Kurt Hornik’s blog
post
for more details.NAMED
mechanism for
determining when objects can be safely mutated in base C code. This
reduces the need for copying in some cases and should allow further
optimizations in the future. It should help make the internal code
easier to maintain. In principle, even the NAMED
mechanism was a
variant of reference counting, but a simple one where the number of
references could only increase (up to a maximum value). Even as
simple operations as passing an R object (value) to a function that
would only read it would permanently increase the reference count of
that object, even after that reading-only function would return. Any
modification of that object later on would require a copy. This is
one of the scenarios fixed by the new mechanism where the reference
counts can and often do decrease as well, so that R knows much more
often that some R values are in fact private and can be modified in
place. This change should not impact existing code (does not break
packages) using supported coding practices in C/C++. It has no
direct impact on R code other than performance/memory usage..S3method()
to register S3 methods in R scripts. See
Kurt Hornik’s blog
post
for more details.palette()
function has a new default set of colors which are
less saturated and have better accessibility properties. There are
also some new built-in palettes, which are listed by the new
palette.pals()
function. The new palette.colors()
function
allows a subset of colors to be selected from any of the built-in
palettes. See a blog
post
of Achim Zeileis, Paul Murrell, Martin Maechler, and Deepayan Sarkar
for more details.unitType()
and unit.psum()
. Packages that were directly accessing elements of
the unit implementation needed updating. See a blog
post
by Paul Murrell and Thomas Lin Pedersen for more details.See https://CRAN.R-project.org/doc/manuals/r-patched/NEWS.html for all changes in the current release series of R, which at the time of this writing is R 4.0.z. Overall, there are 156 news entries for the 4.0.0 release, including 5 significant user-visible changes, 65 new features and 55 bug fixes.
R 3.6.0 (codename “Planting of a Tree”) was released on 2019-04-26 and the R 3.6 series closed with the release of R 3.6.3 (“Holding the Windsock”) on 2020-02-29, marking the 20th anniversary of the R 1.0.0 release. The following gives a selection of the most important changes in the 3.6 series.
sample()
, for instance) has been changed.
This addresses the fact, pointed out by Ottoboni and
Stark, that the previous method
made sample()
noticeably non-uniform on large populations. See
PR#17494
for a discussion. The previous method can be requested using
RNGkind()
or RNGversion()
if necessary for reproduction of old
results. Thanks to Duncan Murdoch for contributing the patch and
Gabe Becker for further assistance.
The output of RNGkind()
has been changed to also return the ‘kind’
used by sample()
.save()
, serialize()
, saveRDS()
,
compiler::cmpfile()
). Serialized data in format 3 cannot be read
by versions of R prior to version 3.5.0. Serialization format
version 2 is still supported and can be selected by version = 2
in
the save/serialization functions. The default can be changed back
for the whole R session by setting environment variables
R_DEFAULT_SAVE_VERSION
and R_DEFAULT_SERIALIZE_VERSION
to 2
.
For maximal back-compatibility, files vignette.rds
and
partial.rdb
generated by R CMD build
are in serialization format
version 2, and resave by default produces files in serialization
format version 2 (unless the original is already in format version
3). The new serialization format is already supported since R
version 3.5.0. It allows compact representation of ALTREP objects,
so that e.g. compact integer sequences are saved as compact. All
elements of such sequence have to be enumerated in format version 2.
The new serialization format also saves the current local encoding
at the time of serialization and strings in native encoding are
translated when de-serialized in an R session with different native
encoding.library()
and require()
now allow more control over handling
search path conflicts when packages are attached. The policy is
controlled by the new conflicts.policy
option. See Luke Tierney’s
blog
post
for more details.hcl.colors()
function to provide wide range of HCL-based color
palettes with much better perceptual properties than the existing
RGB/HSV-based palettes like rainbow()
. Also a new hcl.pals()
function to list available palette names for hcl.colors()
.
Contributed by Achim Zeileis. See blog
post
of Achim Zeileis and Paul Murrell for more details.keep.parse.data
and
keep.parse.data.pkgs
, which control whether parse data are
included into source (source references) when keep.source
or
keep.source.pkgs
is TRUE
. By default, keep.parse.data.pkgs
is
now FALSE
, which changes previous behavior and significantly
reduces space and time overhead when sources are kept when
installing packages. See Tomas Kalibera’s blog
post
for more details on this and other performance optimizations in the
parser.R_PreserveInMSet
and
R_ReleaseFromMSet
have been introduced to replace UNPROTECT_PTR
,
which is not safe to mix with UNPROTECT
(and with
PROTECT_WITH_INDEX
). Intended for use in parsers only. See Tomas
Kalibera’s blog
post
for more details.S3method()
directives in NAMESPACE
can now also be used to
perform delayed S3 method registration. Again, see Kurt Hornik’s
blog
post
for more details.See https://CRAN.R-project.org/doc/manuals/r-devel/NEWS.3.html for all changes in the R 3.y.z releases. Overall, there are 233 news entries for the 3.6.z releases, including 2 significant user-visible changes, 75 new features and 106 bug fixes.
From the source code Subversion repository, changes between April 27, 2019 and April 24, 2020, so the overall code change between R 3.6.0 and R 4.0.0 was: over 24,000 added lines, 12,000 deleted lines and 900 changed files. This is rounded to thousands/hundreds and excludes changes to common generated files, partially generated files, bulk re-organizations, etc. (translations, parsers, autoconf, LAPACK, R Journal bibliography, test outputs).
Figure 1 shows commits by month and weekday, respectively, counting line-based changes in individual commits, excluding the files as above. A noticeable increase of activity is in March, so right before code freeze for the release. A secondary peak of the number of commits can be observed in August. The low amount of changes in July 2019 may be due to conferences and vacations.
Changes between April 23, 2018 and April 26, 2019, so the overall code change between R 3.5.0 and R 3.6.0 was: nearly 27,000 added lines, over 17,000 deleted lines and nearly 800 changed files. This is again rounded to thousands/hundreds and excludes changes to common generated files.
Figure 2 again shows large changes in March before code freeze and in August, and decreased activity in July during R conferences and usual vacations. The right panel suggests that R Core members work a lot even during the weekends and it was even more so when working on R 3.6.0 than on R 4.0.0 (compare Saturday and Wednesday).
Summaries of bug-related activities during the development of R 4.0.0 (from April 27, 2019 to April 24, 2020) were derived from the database underlying R’s Bugzilla system. Figure 3 shows statistics of reported/closed bugs and number of added comments (on any bug report) by calendar month and weekday, respectively.
Comments are added by reporters of the bugs, R Core members and external volunteers. When a bug report is closed, the bug is either fixed or the report is found invalid. In principle, this can happen multiple times for a single report, but those cases are rare. Hence the number of comments is a measure of effort (yet a coarse one which does not distinguish thorough analyses from one-liners) and the number of bug closures is a measure of success in dealing with bugs.
The numbers were impacted by an increase in external contributions to analyzing bugs following a blog post of Tomas Kalibera and Luke Tierney, published October 9, 2019, asking the R community for help, and to contribute those analyzes in the form of comments to R bug reports. There was a considerable increase of comments in October which has lasted (at least) until April. Note that the April numbers don’t cover a full month and are mostly from the 24 days of R 4.0 development in 2020, so after the blog post (4 days are from April 2019). The rate of closing bugs has increased as well since October. What the numbers don’t show is that this is also due to increased activity of R Core that followed increased input from external volunteers. The numbers also seem to suggest that even new bug reports are submitted at a higher rate once more external volunteers focus on analyzing bugs in R.
From the numbers by weekday in the right panel of Figure 3 we again see that the R community keeps working during the weekends.
Figure 4 summarizes bug tracking activities during the development of R 3.6.0 (from April 23, 2018 to April 26, 2019). The decline observed in coding activity in July does not exist in bug-related activities; the number of closed bugs actually peaked in July.
Tomas Kalibera’s work on the article and R development has received funding from the Czech Ministry of Education, Youth and Sports from the Czech Operational Programme Research, Development, and Education, under grant agreement No.CZ.02.1.01/0.0/0.0/15_003/0000421, and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, under grant agreement No. 695412.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kalibera, et al., "Changes in R 3.6--4.0", The R Journal, 2020
BibTeX citation
@article{RJ-2020-2-core, author = {Kalibera, Tomas and Meyer, Sebastian and Hornik, Kurt}, title = {Changes in R 3.6--4.0}, journal = {The R Journal}, year = {2020}, note = {https://rjournal.github.io/}, volume = {12}, issue = {2}, issn = {2073-4859}, pages = {402-406} }