Results 1 
8 of
8
Fast String Correction with LevenshteinAutomata
 INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2002
"... The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshteindistance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshteinautomaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshteinautomaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshteindistance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshteinautomata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshteindistance where further primitive edit operations (transpositions, merges and splits) may be used.
Fast Approximate Search in Large Dictionaries
 COMPUTATIONAL LINGUISTICS
, 2004
"... The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a "universal Levenshtein automaton," we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixeddistance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.
A Survey on OffLine Cursive Script Recognition
, 2002
"... This paper presents a surveyon offline Cursive Word Recognition. The approaches to the problem are described in detail. Each step of the process leading from raw data to the final result is analyzed. This survey is divided into two parts, the first one dealing with the general aspects of Cursive ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
This paper presents a surveyon offline Cursive Word Recognition. The approaches to the problem are described in detail. Each step of the process leading from raw data to the final result is analyzed. This survey is divided into two parts, the first one dealing with the general aspects of Cursive Word Recognition, the second onefocusing on the applications presented in the literature.
Use of Lexicon Density in Evaluating Word Recognizers
 IEEE TRANS PAMI
, 2000
"... We have developed the notion of lexicon density as a metric to measure the expected accuracy of handwritten word recognizers. Thus far, researchers have used the size of the lexicon as a gauge for the di#culty of the handwritten word recognition task. For example, the literature mentions recogniz ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We have developed the notion of lexicon density as a metric to measure the expected accuracy of handwritten word recognizers. Thus far, researchers have used the size of the lexicon as a gauge for the di#culty of the handwritten word recognition task. For example, the literature mentions recognizers with accuracy for lexicons of sizes 10, 100, 1000, and so forth, implying that the di#culty of the task increases (and hence recognition accuracy decreases) with increasing lexicon sizes across recognizers. Lexicon
On the Dependence of Handwritten Word Recognizers on Lexicons
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... The performance of any word recognizer depends on the lexicon presented. Usually large lexicons or lexicons containing similar entries pose greater di#culty for recognizers. ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The performance of any word recognizer depends on the lexicon presented. Usually large lexicons or lexicons containing similar entries pose greater di#culty for recognizers.
Slice Distance
, 2000
"... We introduce a novel way of computing an imageindependent recognizerdependent distance between ASCII words. This distance is a natural generalization of the minimum edit distance in the context of handwriting recognition. It is based on confusion matrices for only parts of characters called cha ..."
Abstract
 Add to MetaCart
We introduce a novel way of computing an imageindependent recognizerdependent distance between ASCII words. This distance is a natural generalization of the minimum edit distance in the context of handwriting recognition. It is based on confusion matrices for only parts of characters called character "slices". This "slice distance" naturally explains the confusions and misrecognitions encountered in recognition of handwritten words and phrases by a particular recognizer and could be used to exploit both the weak and the strong points of the recognizer. Even though we describe the techniques for computing the slice distance using an example of a particular word recognizer, our methods can be easily generalized to almost any segmentationbased handwrittenword recognition system. 1 Introduction Researchers in the area of word recognition naturally seek a measure that would allow them to predict in advance how difficult it would be for their word recognizer to distinguish amo...
A String Matching Approach for Visual Retrieval and Classification
"... We present an approach to measuring similarities between visual data based on approximate string matching. In this approach, an image is represented by an ordered list of feature descriptors. We show the extraction of local features sequences from two types of 2D signals – scene and shape images. T ..."
Abstract
 Add to MetaCart
We present an approach to measuring similarities between visual data based on approximate string matching. In this approach, an image is represented by an ordered list of feature descriptors. We show the extraction of local features sequences from two types of 2D signals – scene and shape images. The similarity of these two images is then measured by 1) solving a correspondence problem between two ordered sets of features and 2) calculating similarities between matched features and dissimilarities between unmatched features. Our experimental study shows that such a globally ordered and locally unordered representation is more discriminative than a bagoffeatures representation and the similarity measure based on string matching is effective. We illustrate the application of the proposed approach to scene classification and shape retrieval, and demonstrate superior performance to existing solutions.