## Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction (2001)

Venue: | Lecture Notes in Computer Science |

Citations: | 10 - 0 self |

### BibTeX

@INPROCEEDINGS{Savary01typographicalnearest-neighbor,

author = {Agata Savary},

title = {Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction},

booktitle = {Lecture Notes in Computer Science},

year = {2001},

pages = {260}

}

### OpenURL

### Abstract

Abstract. A method of error-tolerant lookup in a finite-state lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (nearest neighbours), reducing dynamically the search space in the lexicon, and to reach the first correction as soon as possible. 1

### Citations

688 |
The string-to-string correction problem
- Wagner, Fisher
- 1974
(Show Context)
Citation Context ... another. Different sequences of editing operations may be allowed and different cost functions may be assigned to these editing operations. With the distance measure called edit distance proposed in =-=[18, 11]-=-, editing operations may be assigned arbitrary costs, and they may act on arbitrary positions in the string in arbitrary order (e.g. ca can be obtained from abc by two operations: deletion of b, inver... |

369 |
Techniques for automatically correcting words in text
- Kukich
- 1992
(Show Context)
Citation Context ...n [19, 13, 2]. Automatic spelling correction is one of the oldest applications in the field of natural language processing, and it has a very rich bibliography, a good review of which is presented in =-=[9]-=-. The author divides the existing approaches into three classes: nonword error detection, isolated-word error correction, and contextdependent word correction. Many problems faced by the methods of th... |

342 | Regular models of phonological rule systems
- Kaplan, Kay
- 1994
(Show Context)
Citation Context ...minimal distance from the input word, and the first solution can be obtained rapidly. 2 Related Work Many aspects of a natural language can be treated through finite-state machines in their classical =-=[16, 7]-=- and extended [8] versions, due to their time and space efficiency obtained by determinisation and minimisation [19, 13, 2]. Automatic spelling correction is one of the oldest applications in the fiel... |

219 |
A technique for computer detection and correction of spelling errors
- Damerau
- 1964
(Show Context)
Citation Context ...er addresses only typing errors. They are traditionally interpreted as resulting from one or more editing operations on letters: insertions, deletions, replacements and inversions of adjacent letters =-=[3]-=-. Their correction is related to the theoretical problem of approximate string matching [6], in which the distance between two strings is the minimum cost of all sequences of editing operations that t... |

137 | Approximate string matching
- Hall, Dowling
- 1980
(Show Context)
Citation Context ...r more editing operations on letters: insertions, deletions, replacements and inversions of adjacent letters [3]. Their correction is related to the theoretical problem of approximate string matching =-=[6]-=-, in which the distance between two strings is the minimum cost of all sequences of editing operations that transform one string into another. Different sequences of editing operations may be allowed ... |

56 |
Finite-State Language Processing
- Roche, Schabes
- 1997
(Show Context)
Citation Context ...minimal distance from the input word, and the first solution can be obtained rapidly. 2 Related Work Many aspects of a natural language can be treated through finite-state machines in their classical =-=[16, 7]-=- and extended [8] versions, due to their time and space efficiency obtained by determinisation and minimisation [19, 13, 2]. Automatic spelling correction is one of the oldest applications in the fiel... |

55 | Combining trigram-based and featurebased methods for context-sensitive spelling correction
- Golding, Schabes
- 1996
(Show Context)
Citation Context ...252 Agata Savary (e.g. from → form) requires approaches of the third class, based most of the time on a syntactic and/or stochastic analysis of a local context of words supposed to be erroneous (e.g. =-=[17, 5]-=-. In the second type of approach, i.e. isolated error correction, errors are most often of typing origin, of phonetic origin (e.g. [10], or both. This paper addresses only typing errors. They are trad... |

49 |
Taxonomies and Toolkits of Regular Language Algorithms
- Watson
- 1995
(Show Context)
Citation Context ...natural language can be treated through finite-state machines in their classical [16, 7] and extended [8] versions, due to their time and space efficiency obtained by determinisation and minimisation =-=[19, 13, 2]-=-. Automatic spelling correction is one of the oldest applications in the field of natural language processing, and it has a very rich bibliography, a good review of which is presented in [9]. The auth... |

45 |
Development of a spelling list
- McIlroy
- 1982
(Show Context)
Citation Context ...es into three classes: nonword error detection, isolated-word error correction, and contextdependent word correction. Many problems faced by the methods of the first class in the early research (e.g. =-=[12]-=-, due to the size of the lexicon and its access time, found a solution in the finite-state model of the lexicon. One of the main remaining problems, the recognition of spelling errors resulting in val... |

42 | R.E.: Incremental construction of minimal acyclic finitestate automata
- Daciuk, Mihov, et al.
- 2000
(Show Context)
Citation Context ...natural language can be treated through finite-state machines in their classical [16, 7] and extended [8] versions, due to their time and space efficiency obtained by determinisation and minimisation =-=[19, 13, 2]-=-. Automatic spelling correction is one of the oldest applications in the field of natural language processing, and it has a very rich bibliography, a good review of which is presented in [9]. The auth... |

23 | Minimization of Sequential Transducers
- Mohri
- 1994
(Show Context)
Citation Context ...natural language can be treated through finite-state machines in their classical [16, 7] and extended [8] versions, due to their time and space efficiency obtained by determinisation and minimisation =-=[19, 13, 2]-=-. Automatic spelling correction is one of the oldest applications in the field of natural language processing, and it has a very rich bibliography, a good review of which is presented in [9]. The auth... |

17 |
A model and a fast algorithm for multiple errors spelling correction
- Du, Chang
- 1992
(Show Context)
Citation Context ...nd c). However, an efficient algorithm for edit distance calculation exists only if WI + WD ≤ 2WS ,whereWS, WI, WD are costs assigned to inversion, insertion and deletion operations, respectively. In =-=[4]-=- this distance measure is modified and renamed to error distance by assigning cost 1 to each editing operation and by admitting that errors occur in linear order from left to right so that a later ope... |

16 | Incremental Construction of Finite-State Automata and Transducers, and their use
- Daciuk
- 1998
(Show Context)
Citation Context ...ult to adapt the error distance calculation to a particular application or language, e.g. by considering phonetically motivated interchanges of certain letters or groups of letters, as it was done in =-=[1]-=- for Polish.sTypographical Nearest-Neighbor Search in a Finite-State Lexicon 259 – A correction candidate may be reached several times with different intermediate error distance values. For example wh... |

14 | Extended Finite State Models of Language
- Kornai
- 1999
(Show Context)
Citation Context ... the input word, and the first solution can be obtained rapidly. 2 Related Work Many aspects of a natural language can be treated through finite-state machines in their classical [16, 7] and extended =-=[8]-=- versions, due to their time and space efficiency obtained by determinisation and minimisation [19, 13, 2]. Automatic spelling correction is one of the oldest applications in the field of natural lang... |

10 | Error-tollerant finite state recognition with applications to morphological analysis and spelling correction
- Oflazer
- 1996
(Show Context)
Citation Context ...r Abstract. A method of error-tolerant lookup in a finite-state lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer =-=[14]-=-. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (neares... |

7 |
Morphosyntactic correction in natural language interfaces
- Veronis
- 1988
(Show Context)
Citation Context ...252 Agata Savary (e.g. from → form) requires approaches of the third class, based most of the time on a syntactic and/or stochastic analysis of a local context of words supposed to be erroneous (e.g. =-=[17, 5]-=-. In the second type of approach, i.e. isolated error correction, errors are most often of typing origin, of phonetic origin (e.g. [10], or both. This paper addresses only typing errors. They are trad... |

4 | The typology of unknown words: an experimental study of two corpora
- Ren, Perrault
- 1992
(Show Context)
Citation Context ...nion, the second approach is preferable for many applications for three reasons: statistical studies show that words with multiple errors are rare (0.17% till 1.99% of unknown words in a corpus, with =-=[15]-=-, users are easily discouraged by long lists of correction candidates, and the search time grows exponentially with the admitted distance threshold. Therefore, the tolerant lookup algorithm we propose... |

1 |
M.: Vérification et correction orthographiques assistées par ordinateur, Actes de la Convention IA 89
- Laporte, Silberztein
- 1989
(Show Context)
Citation Context ...is of a local context of words supposed to be erroneous (e.g. [17, 5]. In the second type of approach, i.e. isolated error correction, errors are most often of typing origin, of phonetic origin (e.g. =-=[10]-=-, or both. This paper addresses only typing errors. They are traditionally interpreted as resulting from one or more editing operations on letters: insertions, deletions, replacements and inversions o... |