• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study (1996)

by G Seni, V Kripasundar, R K Srihari
Venue:Pattern Recognition
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

Fast String Correction with Levenshtein-Automata

by Klaus Schulz, Stoyan Mihov - INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION , 2002
"... The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract - Cited by 19 (3 self) - Add to MetaCart
The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshtein-distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshtein-automaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein-automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein-distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein-automata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshtein-distance where further primitive edit operations (transpositions, merges and splits) may be used.

Fast Approximate Search in Large Dictionaries

by Stoyan Mihov, Klaus U. Schulz - COMPUTATIONAL LINGUISTICS , 2004
"... The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a "universal Levenshtein automaton," we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixed-distance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.

Use of Lexicon Density in Evaluating Word Recognizers

by Venu Govindaraju , Petr Slavik, Hanhong Xue - IEEE TRANS PAMI , 2000
"... We have developed the notion of lexicon density as a metric to measure the expected accuracy of handwritten word recognizers. Thus far, researchers have used the size of the lexicon as a gauge for the di#culty of the handwritten word recognition task. For example, the literature mentions recogniz ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
We have developed the notion of lexicon density as a metric to measure the expected accuracy of handwritten word recognizers. Thus far, researchers have used the size of the lexicon as a gauge for the di#culty of the handwritten word recognition task. For example, the literature mentions recognizers with accuracy for lexicons of sizes 10, 100, 1000, and so forth, implying that the di#culty of the task increases (and hence recognition accuracy decreases) with increasing lexicon sizes across recognizers. Lexicon

A Survey on Off-Line Cursive Script Recognition

by Alessandro Vinciarelli , 2002
"... This paper presents a surveyon o#-line Cursive WordRecogM]OyEL The approaches to the problem are described in detail. Each step of the processleading from raw data to the #nal result is analyzed. This survey is divided into two parts, the #rst onedealing with thegey,Hz aspects of Cursive Word ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
This paper presents a surveyon o#-line Cursive WordRecogM]OyEL The approaches to the problem are described in detail. Each step of the processleading from raw data to the #nal result is analyzed. This survey is divided into two parts, the #rst onedealing with thegey,Hz aspects of Cursive WordRecog[zMyEL the second onefocusing on the applications presented in the literature. ? 2002 PatternRecogySzSk Society. Published by Elsevier Science Ltd. AllrigOL reserved. Ke5ti9tz Survey; O#-line cursive wordrecogHO[yEL Handwriting recogiting 1.

On the Dependence of Handwritten Word Recognizers on Lexicons

by Hanhong Xue, Venu Govindaraju - IEEE Transactions on Pattern Analysis and Machine Intelligence , 2002
"... The performance of any word recognizer depends on the lexicon presented. Usually large lexicons or lexicons containing similar entries pose greater di#culty for recognizers. ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
The performance of any word recognizer depends on the lexicon presented. Usually large lexicons or lexicons containing similar entries pose greater di#culty for recognizers.

Offline Cursive Handwriting: From Word to Text Recognition

by A. Vinciarelli, Alessandro Vinciarelli , 2003
"... Contents 1 Introduction 5 2 State of the art 7 2.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Structure of a CWR System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Normalization . . . . . . . . . . . . . . . ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Contents 1 Introduction 5 2 State of the art 7 2.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Structure of a CWR System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 The segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.4 Lexicon reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.5 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.6 The recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.7 Human Reading Inspired Systems . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.8 Holistic approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Slice Distance

by Petr Slav'ik Cedar , 2000
"... We introduce a novel way of computing an image-independent recognizer-dependent distance between ASCII words. This distance is a natural generalization of the minimum edit distance in the context of handwriting recognition. It is based on confusion matrices for only parts of characters called cha ..."
Abstract - Add to MetaCart
We introduce a novel way of computing an image-independent recognizer-dependent distance between ASCII words. This distance is a natural generalization of the minimum edit distance in the context of handwriting recognition. It is based on confusion matrices for only parts of characters called character "slices". This "slice distance" naturally explains the confusions and misrecognitions encountered in recognition of handwritten words and phrases by a particular recognizer and could be used to exploit both the weak and the strong points of the recognizer. Even though we describe the techniques for computing the slice distance using an example of a particular word recognizer, our methods can be easily generalized to almost any segmentation-based handwritten-word recognition system. 1 Introduction Researchers in the area of word recognition naturally seek a measure that would allow them to predict in advance how difficult it would be for their word recognizer to distinguish amo...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University