The stringdist Package for Approximate String Matching
Mark P.J. van der Loo
, The R Journal (2014) 6:1, pages 111-122.
Abstract Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.
Received: 2013-11-04; online 2014-04-27@article{RJ-2014-011, author = {Mark P.J. van der Loo}, title = {{The stringdist Package for Approximate String Matching}}, year = {2014}, journal = {{The R Journal}}, doi = {10.32614/RJ-2014-011}, url = {https://doi.org/10.32614/RJ-2014-011}, pages = {111--122}, volume = {6}, number = {1} }