## Classification Models for Historic Manuscript Recognition (2005)

Venue: | Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR |

Citations: | 4 - 0 self |

### BibTeX

@INPROCEEDINGS{Feng05classificationmodels,

author = {S. L. Feng and R. Manmatha},

title = {Classification Models for Historic Manuscript Recognition},

booktitle = {Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR},

year = {2005}

}

### OpenURL

### Abstract

This paper investigates different machine learning models to solve the historical handwritten manuscript recognition problem. In particular, we test and compare support vector machines, conditional maximum entropy models and Naive Bayes with kernel density estimates and explore their behaviors and properties when solving this problem. We focus on a whole word problem to avoid having to do character segmentation which is difficult with degraded handwritten documents. Our results on a publicly available standard dataset of 20 pages of George Washington’s manuscripts show that Naive Bayes with Gaussian kernel density estimates significantly outperforms the other models and prior work using hidden Markov models on this heavily unbalanced dataset. 1.

### Citations

1168 | A Maximum Entropy Approach to Natural Language Processing
- Berger, Pietra, et al.
- 1996
(Show Context)
Citation Context ... 0/2143 %&'%(*),+ 3 :*),;<>=@?BA 2.2. Conditional Maximum Entropy Models Maximum Entropy models have been recently widely applied in domains involving sequential data learning, e.g. natural languages =-=[7, 8]-=-, biological sequence analysis [2], and very promising results have been achieved. Since maximum entropy models utilize information based on the entire history of a sequence, unlike HMM whose predicat... |

260 | A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language
- Rosenfeld
- 1996
(Show Context)
Citation Context ... 0/2143 %&'%(*),+ 3 :*),;<>=@?BA 2.2. Conditional Maximum Entropy Models Maximum Entropy models have been recently widely applied in domains involving sequential data learning, e.g. natural languages =-=[7, 8]-=-, biological sequence analysis [2], and very promising results have been achieved. Since maximum entropy models utilize information based on the entire history of a sequence, unlike HMM whose predicat... |

203 |
S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6
- Japkowicz, Stephen
- 2002
(Show Context)
Citation Context ...l. The unbalance and sparsity of training data for different words make the multi-classification problem untractable for some standard classifiers such as decision trees, neural networks, as shown in =-=[6]-=-. Here, we investigate the three models in 2 and their behaviors when dealing with this unbalanced data problem. 3.1. Results on Different Models 3.1.1. SVMs We use the MATLAB Support Vector Machine T... |

105 | Using a statistical language model to improve the performance of an HMMbased cursive handwriting recognition system - Marti, Bunke |

87 | Simple Introduction to Maximum Entropy Models for Natural Language Processing
- Ratnaparkhi
- 1997
(Show Context)
Citation Context ...'5RP Q75R With these constraints, the maximum conditional enV U tropy principle picks the model maximizing the conditional entropy: (3) (A . U\6] V\_^ T L O`L S5QMQX 8abdc L 5NMNYXO It has been shown =-=[9]-=- that there is always a unique distriL[ Z bution that satisfies the constraints and maximize the conditional entropy. This distribution has the exponential form: where a normalization constant such th... |

41 | Probabilistic Retrieval of OCR-degraded Text Using N-Grams
- Harding, Croft, et al.
- 1997
(Show Context)
Citation Context ... is formulated as a multi-class classification problem on a largevocabulary. Classification models are investigated on how to accommodate them to the specific task. Results from information retrieval =-=[1]-=- show that for print optical character recognition (OCR), the retrieval performance doesn’t drop significantly even for high word error rate. By analogy although the output will not satisfy the standa... |

39 | Holistic word recognition for handwritten historical documents
- Lavrenko, Rath, et al.
- 2004
(Show Context)
Citation Context ...racter boundaries are difficult to determine this is done by jointly segmenting and recognizing the characters. In this paper, we directly recognize the entire word without character segmentation, as =-=[4]-=- did, and the recognition problem is formulated as a multi-class classification problem on a largevocabulary. Classification models are investigated on how to accommodate them to the specific task. Re... |

13 | Offline recognition of large vocabulary cursive handwritten text
- Vinciarelli, Bengio, et al.
(Show Context)
Citation Context ...ecognition has only been successful in small-vocabulary and highly constrained domains. Only very recently people have started to look at offline recognition of large vocabulary handwritten documents =-=[3]-=-. Marti et al [5] proposed to use a Hidden Markov model (HMM) for handwritten material recognition. Each character is represented using a Hidden Markov model with 14 states. Words and lines are modell... |

3 |
Lyle H.Ungar Maximum Entropy Methods for Biological Sequence Modeling
- Buehler
- 2001
(Show Context)
Citation Context ... Conditional Maximum Entropy Models Maximum Entropy models have been recently widely applied in domains involving sequential data learning, e.g. natural languages [7, 8], biological sequence analysis =-=[2]-=-, and very promising results have been achieved. Since maximum entropy models utilize information based on the entire history of a sequence, unlike HMM whose predications are usually based only on a s... |

1 |
Kandola The Perceptron Algorithm with Uneven
- Li, Zaragoza, et al.
(Show Context)
Citation Context ..., while for some other words abundant training samples are available. When SVM dealing unbalanced data, even margins for negative instances and positive instances may be inappropriate. Uneven margins =-=[10]-=- for SVM may alleviate the effects of unbalance of the dataset. Kernel selection is the key to SVM once the feature sets have been fixed. There still aren’t very good theoretical methods for automatic... |