The R Journal: article published in 2014, volume 6:1

The stringdist Package for Approximate String Matching PDF download
Mark P.J. van der Loo , The R Journal (2014) 6:1, pages 111-122.

Abstract Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.

Received: 2013-11-04; online 2014-04-27
CRAN packages: kernlab, RecordLinkage, MiscPsycho, cba, Mkmisc, deducorrect, vwr, stringdist, textcat, TraMineR
CRAN Task Views implied by cited CRAN packages: OfficialStatistics, Cluster, NaturalLanguageProcessing, Graphics, MachineLearning, Multivariate, Optimization, Survival

CC BY 4.0
This article is licensed under a Creative Commons Attribution 3.0 Unported license .

  author = {Mark P.J. van der Loo},
  title = {{The stringdist Package for Approximate String Matching}},
  year = {2014},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2014-011},
  url = {},
  pages = {111--122},
  volume = {6},
  number = {1}