## Fast String Correction with Levenshtein-Automata (2002)

Venue: | INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION |

Citations: | 27 - 4 self |

### BibTeX

@ARTICLE{Schulz02faststring,

author = {Klaus Schulz and Stoyan Mihov},

title = {Fast String Correction with Levenshtein-Automata},

journal = {INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION},

year = {2002},

volume = {5},

pages = {67--85}

}

### Years of Citing Articles

### OpenURL

### Abstract

The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshtein-distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshtein-automaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein-automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein-distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein-automata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshtein-distance where further primitive edit operations (transpositions, merges and splits) may be used.