Results 1 - 10
of
23
Offline recognition of unconstrained handwritten texts using HMMs and statistical language models
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Severa ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, error rate is reduced by ∼50 % for single writer data and by ∼25 % for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed. Models.
A full English sentence database for off-line handwriting recognition
- In Proc. Int. Conf. on Document Analysis and Recognition
, 1999
"... In this paper we present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of texts that were used to generate forms, which subsequently ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In this paper we present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of texts that were used to generate forms, which subsequently were filled out by persons with their handwriting. Up to now (December 1998) the database includes 556 forms produced by approximately 250 different writers. The database consists of full English sentences. It can serve as a basis for a variety of handwriting recognition tasks. The main focus, however, is on recognition techniques that use linguistic knowledge beyond the lexicon level. This knowledge can be automatically derived from the corpus or it can be supplied from external sources. Keywords: handwriting recognition, database, unconstrained English sentences, corpus, linguistic knowledge 1 Introduction Standard databases have become very important in handwriting recognition research...
Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition
- In Sixth International Conference on Document Analysis and Recognition
, 2001
"... In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmen ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmented into lines. Then each line of text is normalized with respect to of skew, slant, vertical position and width. After these steps, text lines are segmented into single words. For this purpose distances between connected components are measured. Using a threshold, the distances are divided into distances within a word and distances between different words. A line of text is segmented at positions where the distances are larger than the chosen threshold. From each image representing a single word, a sequence of features is extracted. These features are input to a recognition procedure which is based on hidden Markov models. To investigate the stability of the segmentation algorithm ...
Holistic Word Recognition for Handwritten Historical Documents
, 2004
"... Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researcher ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition approaches. Here we present a holistic word recognition approach for single-author historical documents, which is motivated by the fact that for severely degraded documents a segmentation of words into characters will produce very poor results. The quality of the original documents does not allow us to recognize them with high accuracy - our goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. We believe that this is the first systematic approach to recognizing words in historical manuscripts with extensive experiments. Our experiments show a recognition accuracy of 65%, which exceeds performance of other systems that operate on non-degraded input images (non historical documents) .
Word spotting for historical documents
- INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
, 2007
"... Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that li ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering
Recognition of Cursive Roman Handwriting - Past, Present and Future
- In Proc. 7th Int. Conf. on Document Analysis and Recognition
, 2003
"... This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taski ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taskinvolves a number of processing steps, some of w ich are quite di#cult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.
Handwritten Sentence Recognition
- In Proc. Int. Conf. on Pattern Recognition
, 2000
"... In this paper we present a system for reading handwritten sentences and paragraphs. The system's main components are preprocessing, feature extraction and recognition. In contrast to other systems, whole lines of text are the basic units for the recognizer. Thus the difficult problem of segmenting a ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
In this paper we present a system for reading handwritten sentences and paragraphs. The system's main components are preprocessing, feature extraction and recognition. In contrast to other systems, whole lines of text are the basic units for the recognizer. Thus the difficult problem of segmenting a line of text into individual words can be avoided. Another novel feature of the system is the incorporation of a statistical language model into the recognizer. Experiments on the database described in [8] have shown that a recognition rate on the word level of 79.5% and 60.05% for small (776 words) and larger (7719 words) vocabularies can be reached. These figures increase to 84.3% and 67.32% if the top ten choices are taken into regard. 1 Introduction In the last years the field of handwriting recognition was the topic of intensive research. While the first systems read segmented characters, later systems aimed at the recognition of cursively handwritten words. Only short time ago the f...
Feature Selection Using Genetic Algorithms for Handwritten Character Recognition
, 2000
"... this paper, we introduce a feature selection method, which can minimize most of the problems can be found in the conventional approaches, by applying genetic algorithms(GA) which recently received considerable attention regarding their potential as an optimization technique for complex problems. Gen ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
this paper, we introduce a feature selection method, which can minimize most of the problems can be found in the conventional approaches, by applying genetic algorithms(GA) which recently received considerable attention regarding their potential as an optimization technique for complex problems. Genetic algorithms are stochastic search technique based on the mechanism of natural selection and natural genetics.
Line Detection and Segmentation in Historical Church Registers
, 2001
"... For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer. The automatic recognition of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in this paper provide ways to reconstruct the path of the lines of text using an approach of gradually constructing line segments until an unique line of text is formed. In addition, the single lines are segmented and an output in form of a raster image is provided. The method was applied to church registers. They were written between the 17th and 19th century. Line segmentation was found to be successful in 97% of all samples.
Boosted decision trees for word recognition in handwritten document retrieval
- in: 28th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005
"... Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon word image classification as a key step. Decision trees with normalized pixels as features form the basis of a highly acc ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon word image classification as a key step. Decision trees with normalized pixels as features form the basis of a highly accurate AdaBoost classifier, trained on a corpus of word images that have been resized and sampled at a pyramid of resolutions. To stem problems from the highly skewed distribution of class frequencies, word classes with very few training samples are augmented with stochastically altered versions of the originals. This increases recognition performance substantially. On a standard corpus of 20 pages of handwritten material from the George Washington collection the recognition performance shows a substantial improvement in performance over previous published results (75 % vs 65%). Following word recognition, retrieval is done using a language model over the recognized words. Retrieval performance also shows substantially improved results over previously published results on this database. Recognition/retrieval results on a more challenging database of 100 pages from the George Washington collection are also presented.

