Regular expressions are powerful tools for manipulating non-tabular textual data. For many tasks (visualization, machine learning, etc), tables of numbers must be extracted from such data before processing by other R functions. We present the R package namedCapture, which facilitates such tasks by providing a new user-friendly syntax for defining regular expressions in R code. We begin by describing the history of regular expressions and their usage in R. We then describe the new features of the namedCapture package, and provide detailed comparisons with related R packages (rex, stringr, stringi, tidyr, rematch2, re2r).
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2019-050.zip
namedCapture, rex, stringr, stringi, tidyr, rematch2, re2r, microbenchmark
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hocking, "Comparing namedCapture with other R packages for regular expressions", The R Journal, 2019
BibTeX citation
@article{RJ-2019-050, author = {Hocking, Toby Dylan}, title = {Comparing namedCapture with other R packages for regular expressions}, journal = {The R Journal}, year = {2019}, note = {https://doi.org/10.32614/RJ-2019-050}, doi = {10.32614/RJ-2019-050}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {328-346} }