## A Fast Algorithm for Finding the Nearest Neighbor of a Word in a Dictionary (1993)

Venue: | In Proc. 2nd Int. Conference on Document Analysis and Recognition ICDAR’93 |

Citations: | 7 - 1 self |

### BibTeX

@INPROCEEDINGS{Bunke93afast,

author = {Horst Bunke and Horst Bunke},

title = {A Fast Algorithm for Finding the Nearest Neighbor of a Word in a Dictionary},

booktitle = {In Proc. 2nd Int. Conference on Document Analysis and Recognition ICDAR’93},

year = {1993},

pages = {632--637}

}

### OpenURL

### Abstract

In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach [11]. However, while in [11] the two strings to be compared may be given online, our algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted, in an off-line phase to be carried out beforehand, into a special type of deterministic finite state automaton. Now, given an input string corresponding to a word with possible OCR errors and the automaton derived from the dictionary word, the computation of the edit distance between the two strings corresponds to a traversal of the states of the automaton. This procedure needs time which is only linear in the length of the OCR word. It is independent of the length of the dictionary word. Given not only one but N different dictionary words, their corresponding automata can be combined into a single deterministic finite state automaton. Thus the co...

### Citations

3836 |
Introduction to automata theory, languages and computation. Addison-Wesley publishing company
- John, Ullman
- 1979
(Show Context)
Citation Context ... b 1 b 2 : : : b m )) = ffi(I ; b 1 b 2 : : : b m ). In this Lemma, the function ffi : Q \Theta V ! Q has been extended to ffi : Q \Theta V ! Q in the standard fashion; see, for example, chapter 2 of =-=[20]-=-. Proof of Lemma 4.4: The proof is by induction on the length of B. For B = ffl we have ffi(I ; ffl) = I = (1; 1; : : : ; 1). Clearly, S(0; ffl) = C(0) and T (C(0)) = 8 (1; 1; : : : ; 1). Now assume t... |

658 |
The string-to-string correction problem
- Wagner, Fisher
- 1974
(Show Context)
Citation Context ...m for Finding the Nearest Neighbor of a Word in a Dictionary Horst Bunke Abstract In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach =-=[11]-=-. However, while in [11] the two strings to be compared may be given online, our algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This diction... |

132 |
Approximate string matching
- Hall, Dowling
- 1980
(Show Context)
Citation Context ...arch for many years. Two algorithm with a better assymptotical time complexity have been published [12,13]. A general discussion of string edit distance including various applications is contained in =-=[14]-=-. For a resent survey see [15]. Stochastic versions of string matching and their applications to the correction of distorted words have been described in [16,17]. One of the apparent problems in the a... |

31 |
Experiments in Text Recognition with Binary N-Gram and Viterbi Algorithms
- Hull, Srihari
- 1982
(Show Context)
Citation Context ...For an earlier collection of papers addressing the same problem domain see [6]. There are different categories of contextual postprocessing methods. One class of methods is based on n-gram statistics =-=[7,8]-=-. Such methods rely on transition probabilities between consecutive letters of a word. As an advantage of n-gram based methods, these transition probabilities can be determined off-line, for example, ... |

26 |
A review of segmentation and contextual analysis techniques for text recognition
- Elliman, Lancaster
- 1990
(Show Context)
Citation Context ...ading device, nevertheless, the application of postprocessing techniques using contextual information is considered very useful. A recent survey of contextual postprocessing methods has been given in =-=[5]-=-. For an earlier collection of papers addressing the same problem domain see [6]. There are different categories of contextual postprocessing methods. One class of methods is based on n-gram statistic... |

24 |
A contextual postprocessing system for error correction using binary n-grams
- Riseman, Hanson
- 1974
(Show Context)
Citation Context ...For an earlier collection of papers addressing the same problem domain see [6]. There are different categories of contextual postprocessing methods. One class of methods is based on n-gram statistics =-=[7,8]-=-. Such methods rely on transition probabilities between consecutive letters of a word. As an advantage of n-gram based methods, these transition probabilities can be determined off-line, for example, ... |

23 |
Computer Text Recognition and Error Correction
- Srihari
- 1984
(Show Context)
Citation Context ...ontextual information is considered very useful. A recent survey of contextual postprocessing methods has been given in [5]. For an earlier collection of papers addressing the same problem domain see =-=[6]-=-. There are different categories of contextual postprocessing methods. One class of methods is based on n-gram statistics [7,8]. Such methods rely on transition probabilities between consecutive lette... |

10 |
Spelling correction using probabilistic methods
- RL, Oommen
- 1984
(Show Context)
Citation Context ...ding various applications is contained in [14]. For a resent survey see [15]. Stochastic versions of string matching and their applications to the correction of distorted words have been described in =-=[16,17]-=-. One of the apparent problems in the application of string 2 edit distance to the correction of OCR-output is the high computational complexity. This problem is particularly serious if the underlying... |

10 |
A string correction algorithm for cursive script recognition
- SRmARI, BOZINOVIC
(Show Context)
Citation Context ...ding various applications is contained in [14]. For a resent survey see [15]. Stochastic versions of string matching and their applications to the correction of distorted words have been described in =-=[16,17]-=-. One of the apparent problems in the application of string 2 edit distance to the correction of OCR-output is the high computational complexity. This problem is particularly serious if the underlying... |

7 | Document image analysis systems - O'Gorman, Kasturi - 1992 |

5 |
Algorithms for Approximate String
- Ukkonen
- 1985
(Show Context)
Citation Context ...son. Improving the time complexity of the Wagner & Fischer algorithm has been a major subject of research for many years. Two algorithm with a better assymptotical time complexity have been published =-=[12,13]-=-. A general discussion of string edit distance including various applications is contained in [14]. For a resent survey see [15]. Stochastic versions of string matching and their applications to the c... |

3 |
A Spelling Correction Method and its Application to an OCR
- unknown authors
- 1990
(Show Context)
Citation Context ...articularly serious if the underlying dictionary is large. Then some preselection technique is indispensible in order to quickly determine a small set of potential candidate words from the dictionary =-=[18]-=-. In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach [11]. However, while in [11] the two strings to be compared may be given online,... |

2 | Optical Character Recognition, Special Issue of - Pavlidis, Mori - 1992 |

2 |
The use of a trie structured dictionary as a contextual aid to recognition of handwritten british postal addresses
- Downtown, Tregido
- 1991
(Show Context)
Citation Context ...ormation present in a dictionary and may thus not lead to a performance as good as that of methods that use a full dictionary. Another category of postprocessing methods is based on dictionary search =-=[9,10]-=-. If a word output by an OCR device is present in the dictionary it is assumed that all its characters have been correctly recognized. If it is not contained, then the most similar dictionary entries ... |

1 | Document Image Analysis Techniques - Kasturi, O'Gorman |

1 |
Recognition of Cursive Words
- Leroux, Salome, et al.
- 1991
(Show Context)
Citation Context ...ormation present in a dictionary and may thus not lead to a performance as good as that of methods that use a full dictionary. Another category of postprocessing methods is based on dictionary search =-=[9,10]-=-. If a word output by an OCR device is present in the dictionary it is assumed that all its characters have been correctly recognized. If it is not contained, then the most similar dictionary entries ... |

1 |
A Faster Algorithm for Comparing StringEdit Distances
- Masek, Paterson
- 1980
(Show Context)
Citation Context ...rings to be matched, provided that the other string has undergone some preprocessing in an off-line phase. This is a great improvement over other known algorithms for string edit distance computation =-=[11,12,13]-=-. The algorithm can be extended to matching a word against a dictionary of any size. In this case the time complexity is independent of the length of the dictionary words, and the number of entries in... |

1 |
Very Fast Recognition of Giro Check Form
- unknown authors
- 1993
(Show Context)
Citation Context ...lso, we analyze its computational complexity. Potential applications are beyond the scope of this paper. However, the practical use of the algorithm in the context of a system for reading check forms =-=[19]-=- is currently under investigation. The rest of this paper is organized in the following way. Section 2 introduces the basic terminology and briefly reviews the Wagner & Fischer algorithm. Then, some f... |