Collections in R: Review and Proposal

R is a powerful tool for data processing, visualization, and modeling. However, R is slower than other languages used for similar purposes, such as Python. One reason for this is that R lacks base support for collections, abstract data types that store, manipulate, and return data (e.g., sets, maps, stacks). An exciting recent trend in the R extension ecosystem is the development of collection packages, packages that provide classes that implement common collections. At least 12 collection packages are available across the two major R extension repositories, the Comprehensive R Archive Network (CRAN) and Bioconductor. In this article, we compare collection packages in terms of their features, design philosophy, ease of use, and performance on benchmark tests. We demonstrate that, when used well, the data structures provided by collection packages are in many cases significantly faster than the data structures provided by base R. We also highlight current deficiencies among R collection packages and propose avenues of possible improvement. This article provides useful recommendations to R programmers seeking to speed up their programs and aims to inform the development of future collection-oriented software for R.

Timothy Barry
2018-06-13

CRAN packages used

Rcpp, hashr, hashFunction, filehashSQLite, tictoc, DSL, bit64, bit, Oarray, sets, filehash, hash, hashmap, rstackdeque, rstack, liqueueR, dequer, flifo, listenv, stdvectors, microbenchmark, neuroim, FindMinIC

CRAN Task Views implied by cited packages

HighPerformanceComputing, MedicalImaging, NumericalMathematics

Bioconductor packages used

S4Vectors

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Barry, "Collections in R: Review and Proposal", The R Journal, 2018

BibTeX citation

@article{RJ-2018-037,
  author = {Barry, Timothy},
  title = {Collections in R: Review and Proposal},
  journal = {The R Journal},
  year = {2018},
  note = {https://doi.org/10.32614/RJ-2018-037},
  doi = {10.32614/RJ-2018-037},
  volume = {10},
  issue = {1},
  issn = {2073-4859},
  pages = {455-471}
}