Results 1 - 10
of
51
Offline recognition of unconstrained handwritten texts using HMMs and statistical language models
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Severa ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, error rate is reduced by ∼50 % for single writer data and by ∼25 % for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed. Models.
Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition
- In Sixth International Conference on Document Analysis and Recognition
, 2001
"... In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmen ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmented into lines. Then each line of text is normalized with respect to of skew, slant, vertical position and width. After these steps, text lines are segmented into single words. For this purpose distances between connected components are measured. Using a threshold, the distances are divided into distances within a word and distances between different words. A line of text is segmented at positions where the distances are larger than the chosen threshold. From each image representing a single word, a sequence of features is extracted. These features are input to a recognition procedure which is based on hidden Markov models. To investigate the stability of the segmentation algorithm ...
Holistic Word Recognition for Handwritten Historical Documents
, 2004
"... Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researcher ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition approaches. Here we present a holistic word recognition approach for single-author historical documents, which is motivated by the fact that for severely degraded documents a segmentation of words into characters will produce very poor results. The quality of the original documents does not allow us to recognize them with high accuracy - our goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. We believe that this is the first systematic approach to recognizing words in historical manuscripts with extensive experiments. Our experiments show a recognition accuracy of 65%, which exceeds performance of other systems that operate on non-degraded input images (non historical documents) .
Word spotting for historical documents
- INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
, 2007
"... Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that li ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering
A New Normalization Technique for Cursive Handwritten Words
, 2000
"... This paper presents new techniques for slant and slope removal in cursive handwritten words. Both new methods make no use of heuristics and no manual parameter tuning is needed. This avoids the heavy experimental effort required to find the optimal configuration of a parameter set. A comparison betw ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
This paper presents new techniques for slant and slope removal in cursive handwritten words. Both new methods make no use of heuristics and no manual parameter tuning is needed. This avoids the heavy experimental effort required to find the optimal configuration of a parameter set. A comparison between the new deslanting technique and the method proposed by Bozinovic and Srihari was made by measuring the performance of a word recognition system on different databases. The new technique is shown to improve the recognition rate of the system and to avoid the long exploration of the parameter space needed by the other method.
Recognition of Cursive Roman Handwriting - Past, Present and Future
- In Proc. 7th Int. Conf. on Document Analysis and Recognition
, 2003
"... This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taski ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taskinvolves a number of processing steps, some of w ich are quite di#cult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.
Segmentation and Recognition of Handwritten Dates
- In Proc. 8 th IWFHR
, 2002
"... This paper presents an HMM-MLP hybrid system to recognize complex date images written on Brazilian bank cheques. The system first segments implicitly a date image into sub-fields through the recognition process based on an HMM-based approach. Afterwards, the three obligatory date sub-fields are proc ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
This paper presents an HMM-MLP hybrid system to recognize complex date images written on Brazilian bank cheques. The system first segments implicitly a date image into sub-fields through the recognition process based on an HMM-based approach. Afterwards, the three obligatory date sub-fields are processed by the system (day, month and year). A neural approach has been adopted to work with strings of digits and a Markovian strategy to recognize and verify words. We also introduce the concept of meta-classes of digits, which is used to reduce the lexicon size of the day and year and improve the precision of their segmentation and recognition. Experiments show interesting results on date recognition.
Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System
- In 7th Int. Conference on Document Analysis and Recognition, 2003. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE
, 2003
"... A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic traini ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thickening operations. The proposed perturbation model is evaluated under different experimental conditions.
Rejection Strategies for Offline Handwritten Sentence Recognition
- In 17th International Conference on Pattern Recognition
, 2004
"... This paper investigates three different rejection strategies for offline handwritten sentence recognition. The rejection strategies are implemented as a postprocessing step of a Hidden Markov Model based text recognition system and are based on confidence measures derived from a list of candidate se ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
This paper investigates three different rejection strategies for offline handwritten sentence recognition. The rejection strategies are implemented as a postprocessing step of a Hidden Markov Model based text recognition system and are based on confidence measures derived from a list of candidate sentences produced by the recognizer. The better performing confidence measures make use of the fact that the recognizer integrates a word bigram language model. Experimental results on extracted sentences from the IAM database validate the effectiveness of the proposed rejection strategies.
Offline Recognition of Large Vocabulary Cursive Handwritten Text
"... This paper presents a system for the offline recognition of cursive handwritten lines of text. The system is based on continuous density HMMs and Statistical Language Models. The system recognizes data produced by a single writer. No a-priori knowledge is used about the content of the text to be rec ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper presents a system for the offline recognition of cursive handwritten lines of text. The system is based on continuous density HMMs and Statistical Language Models. The system recognizes data produced by a single writer. No a-priori knowledge is used about the content of the text to be recognized. Changes in the experimental setup with respect to the recognition of single words are highlighted. The results show a recognition rate of #85% with a lexicon containing 50'000 words. The experiments were performed over a publicly available database.

