Fast String Correction with Levenshtein-Automata (2002)
| Venue: | INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION |
| Citations: | 19 - 3 self |
BibTeX
@ARTICLE{Schulz02faststring,
author = {Klaus Schulz and Stoyan Mihov},
title = {Fast String Correction with Levenshtein-Automata},
journal = {INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION},
year = {2002},
volume = {5},
pages = {67--85}
}
Years of Citing Articles
OpenURL
Abstract
The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshtein-distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshtein-automaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein-automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein-distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein-automata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshtein-distance where further primitive edit operations (transpositions, merges and splits) may be used.







