The R Journal: accepted article

This article will be copy edited and may be changed before publication.

iotools: High-Performance I/O Tools for R
Taylor Arnold, Michael J. Kane and Simon Urbanek

Abstract The iotools package provides a set of tools for input and output intensive data processing in R. The functions chunk.apply and read.chunk are supplied to allow for iteratively loading contiguous blocks of data into memory as raw vectors. These raw vectors can then be efficiently converted into matrices and data frames with the iotools functions mstrsplit and dstrsplit. These functions minimize copying of data and avoid the use of intermediate strings in order to drastically improve performance. Finally, we also provide read.csv.raw to allow users to read an entire dataset into memory with the same efficient parsing code. In this paper, we present these functions through a set of examples with an emphasis on the flexibility provided by chunk-wise operations. We provide benchmarks comparing the speed of read.csv.raw to data loading functions provided in base R and other contributed packages.

Received: 2015-03-20; online 2017-05-10
CRAN packages: bigmemory, ff, readr, foreach, iterators, iotools , CRAN Task Views implied by cited CRAN packages: HighPerformanceComputing


CC BY 4.0
This article is licensed under a Creative Commons Attribution 4.0 International license.

@article{RJ-2017-001,
  author = {Taylor Arnold and Michael J. Kane and Simon Urbanek},
  title = {{iotools: High-Performance I/O Tools for R}},
  year = {2017},
  journal = {{The R Journal}},
  url = {https://journal.r-project.org/archive/2017/RJ-2017-001/index.html}
}