iotools: High-Performance I/O Tools for R

The iotools package provides a set of tools for input and output intensive data processing in R. The functions chunk.apply and read.chunk are supplied to allow for iteratively loading contiguous blocks of data into memory as raw vectors. These raw vectors can then be efficiently converted into matrices and data frames with the iotools functions mstrsplit and dstrsplit. These functions minimize copying of data and avoid the use of intermediate strings in order to drastically improve performance. Finally, we also provide read.csv.raw to allow users to read an entire dataset into memory with the same efficient parsing code. In this paper, we present these functions through a set of examples with an emphasis on the flexibility provided by chunk-wise operations. We provide benchmarks comparing the speed of read.csv.raw to data loading functions provided in base R and other contributed packages.

Taylor Arnold , Michael J. Kane , Simon Urbanek
2017-05-10

CRAN packages used

bigmemory, ff, readr, foreach, iterators, iotools, Matrix

CRAN Task Views implied by cited packages

HighPerformanceComputing, Econometrics, Multivariate, NumericalMathematics

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Arnold, et al., "iotools: High-Performance I/O Tools for R", The R Journal, 2017

BibTeX citation

@article{RJ-2017-001,
  author = {Arnold, Taylor and Kane, Michael J. and Urbanek, Simon},
  title = {iotools: High-Performance I/O Tools for R},
  journal = {The R Journal},
  year = {2017},
  note = {https://doi.org/10.32614/RJ-2017-001},
  doi = {10.32614/RJ-2017-001},
  volume = {9},
  issue = {1},
  issn = {2073-4859},
  pages = {6-13}
}