GrpString: An R Package for Analysis of Groups of Strings

The R package GrpString was developed as a comprehensive toolkit for quantitatively analyzing and comparing groups of strings. It offers functions for researchers and data analysts to prepare strings from event sequences, extract common patterns from strings, and compare patterns be tween string vectors. The package also finds transition matrices and complexity of strings, determines clusters in a string vector, and examines the statistical difference between two groups of strings.

Hui Tang , Elizabeth L. Day , Molly B. Atkinson , Norbert J. Pienta

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at

CRAN packages used

stringr, stringb, stringi, gsubfn, uniqtag, stringdist, TraMineR, informR, GrpString, entropy

CRAN Task Views implied by cited packages

NaturalLanguageProcessing, OfficialStatistics, Survival


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Tang, et al., "The R Journal: GrpString: An R Package for Analysis of Groups of Strings", The R Journal, 2018

BibTeX citation

  author = {Tang, Hui and Day, Elizabeth L. and Atkinson, Molly B. and Pienta, Norbert J.},
  title = {The R Journal: GrpString: An R Package for Analysis of Groups of Strings},
  journal = {The R Journal},
  year = {2018},
  note = {},
  doi = {10.32614/RJ-2018-002},
  volume = {10},
  issue = {1},
  issn = {2073-4859},
  pages = {359-369}