Results 1 - 10
of
12
Hidden Markov processes
- IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and information-theoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discrete-time finite-state homogeneous Markov chain observed through a discrete-time memoryless invariant channel. In recent years, the work of Baum and Petrie on finite- ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
Abstract—An overview of statistical and information-theoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discrete-time finite-state homogeneous Markov chain observed through a discrete-time memoryless invariant channel. In recent years, the work of Baum and Petrie on finite-state finite-alphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximum-likelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finite-state channels, hidden Markov models, identifiability, Kalman filter, maximum-likelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
A Computational Theory of Visual Word Recognition
, 1988
"... A computational theory of the visual recognition of words of text is developed. The theory, based on previous studies of how people read, includes three stages: hypothesis generation, hypothesis testing, and global contextual analysis. Hypothesis generation uses gross visual features, such as those ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
A computational theory of the visual recognition of words of text is developed. The theory, based on previous studies of how people read, includes three stages: hypothesis generation, hypothesis testing, and global contextual analysis. Hypothesis generation uses gross visual features, such as those that could be extracted from the peripheral presentation of a word, to provide expectations about word identity. Hypothesis testing integrates the information
determined by hypothesis generation with more detailed features that are extracted from the word image. Global contextual analysis provides syntactic and semantic information that influences hypothesis testing.
Algorithmic realization of the computational theory also consists of three stages. Hypothesis generation is implemented by extracting simple features from an input word and using those features to find a set of dictionary words with those features in common. Hypothesis testing uses this set of words to drive further selective image analysis that matches the input to one of the members of this set. This is done with a tree of feature tests that can be executed in several different ways to recognize an input word. Global contextual analysis is implemented with a process that uses knowledge of typical word-class transitions to improve the
performance of the hypothesis testing stage. This is executable in parallel with hypothesis testing.
This methodology is in sharp contrast to conventional machine reading algorithms which usually segment a word into characters and recognize the individual characters. Thus, a word decision is arrived at as a composite of character decisions. The algorithm presented here avoids the segmentation stage and does not require an exhaustive analysis of each character and thus is a character recognition algorithm.
Statistical projections show the viability of all three stages of the proposed approach. Experiments with images of text show that the methodology performs well in difficult
situations, such as touching and overlapping characters.
Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts
- Cryptologia
, 1993
"... We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requir ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four welldefined language-recognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0th-order noise from unknown 1st-order language, and 4) detecting non-uniform unknown language. For the second problem we give a most powerful test based on the Neyman-Pearson Lemma. For the oth...
Serially Concatenated Systems: An Iterative Decoding Approach with Application to Continuous Phase Modulation
, 1999
"... Iterative methods for concatenated coding and modulation in digital communication systems are considered. It is assumed that the code and modulation can be described by finite-state machines (FSM). An iterative decoder for such a system typically consists of a posteriori probability (APP) algorithms ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Iterative methods for concatenated coding and modulation in digital communication systems are considered. It is assumed that the code and modulation can be described by finite-state machines (FSM). An iterative decoder for such a system typically consists of a posteriori probability (APP) algorithms for the constituent FSMs. Starting with a detailed examination of these algorithms, it is found that their initialization values can be formally justified. Then, possible iterative methods such as fix-point iteration, Jacobi over-relaxation, damped substitution, and Newton's method are presented and evaluated. The result is that fix-point iteration seems to be the best choice in most situations.
Word Discrimination Based on Bigram Co-occurrences
- Proceedings of the 6 th International Conference on Document Analysis and Recognition
, 2001
"... Very few pairs of English words share exactly the same letter bigrams. This linguistic property can be exploited to bring lexical context into the classification stage of a word recognition system. The lexical n-gram matches between every word in a lexicon and a subset of reference words can be prec ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Very few pairs of English words share exactly the same letter bigrams. This linguistic property can be exploited to bring lexical context into the classification stage of a word recognition system. The lexical n-gram matches between every word in a lexicon and a subset of reference words can be precomputed. If a match function can detect matching segments of at least n-gram length from the feature representation of words, then an unknown word can be recognized by determining the subset of reference words having an n-gram match at the feature level with the unknown word. We show that with a reasonable number of reference words, bigrams represent the best compromise between the recall ability of single letters and the precision of trigrams. Our simulations indicate that using a longer reference list can compensate errors in feature extraction. The algorithm is fast enough, even with a slow processor, for human-computer interaction.
Speech Processing with Linear and Neural Network Models
, 1996
"... ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the quality is maintained when pitch modifications are applied. The second part of the dissertation studies the operation of recurrent neural networks in classifying patterns of correlated feature vectors. Such patterns are typical of speech classification tasks. The operation of a hidden node with a recurrent connection is explained in terms of a decision boundary which changes position in feature space. The feedback is shown to delay switching from one class to another and to smooth output decisions for sequences of feature vectors from the same class. For networks trained with constant class targets, a sequence of feature vectors from the same class tends to drive the operation of hidden nod
Convergence of the Maximum a Posteriori Path Estimator in Hidden Markov Models
, 2002
"... In a hidden Markov model (HMM) the underlying nite-state Markov chain cannot be observed directly but only by an additional process. We are interested in estimating the unknown path of the Markov chain. The most widely used estimator is the maximum a posteriori path estimator (MAP path estimator). I ..."
Abstract
- Add to MetaCart
In a hidden Markov model (HMM) the underlying nite-state Markov chain cannot be observed directly but only by an additional process. We are interested in estimating the unknown path of the Markov chain. The most widely used estimator is the maximum a posteriori path estimator (MAP path estimator). It can be calculated eectively by the Viterbi algorithm as is, e.g., frequently done in the eld of coding theory, correction of intersymbol interference and speech recognition. We investigate (componentwise) convergence of the MAP path estimator. Convergence is shown under the condition of unbounded likelihood ratios. This condition is satis ed in the important case of HMMs with additive white Gaussian noise. We also prove convergence, if the Markov chain has two states. The so-called Viterbi paths are an important tool for obtaining these results.
On the Entropy of a Hidden Markov Process
- Proceedings of the Data Compression Conference, Snowbird
, 2004
"... We study the entropy rate of a binary hidden Markov process (HMP) defined by observing the output of a binary symmetric channel whose input is a first-order binary Markov process. Despite the simplicity of the models involved, the characterization of this entropy is a long standing open problem. By ..."
Abstract
- Add to MetaCart
We study the entropy rate of a binary hidden Markov process (HMP) defined by observing the output of a binary symmetric channel whose input is a first-order binary Markov process. Despite the simplicity of the models involved, the characterization of this entropy is a long standing open problem. By presenting the probability of a sequence under the model as a product of random matrices, we show that the entropy rate sought is a top Lyapunov exponent of the product, which explains the di#culty in its explicit computation. We apply the same product of random matrices to derive an explicit expression for a first order Taylor approximation of the entropy rate with respect to the parameter of the binary symmetric channel. The accuracy of the approximation is validated against empirical simulation results. We also extend our results to Renyi's entropy of any order.
Handwriting Recognition Using Position Sensitive Letter N-Gram Matching
"... We propose further improvement of a handwriting recognition method that avoids segmentation while able to recognize words that were never seen before in handwritten form. This method is based on the fact that few pairs of English words share exactly the same set of letter bigrams and even fewer shar ..."
Abstract
- Add to MetaCart
We propose further improvement of a handwriting recognition method that avoids segmentation while able to recognize words that were never seen before in handwritten form. This method is based on the fact that few pairs of English words share exactly the same set of letter bigrams and even fewer share longer n-grams. The lexical n-gram matches between every word in a lexicon and a set of reference words can be precomputed. A position-based match function then detects the matches between the handwritten signal of a query word and each reference word. We show that with a reasonable set of reference words, the recognition of lexicon words exceeds 90%.
On-Line Handwriting Recognition Based on Bigram Co-occurrences
"... We propose a handwriting recognition method that utilizes the n-gram statistics of the English language. It is based on the linguistic property that very few pairs of English words share exactly the same letter bigrams. This property is exploited to bring context to the recognition stage and to avoi ..."
Abstract
- Add to MetaCart
We propose a handwriting recognition method that utilizes the n-gram statistics of the English language. It is based on the linguistic property that very few pairs of English words share exactly the same letter bigrams. This property is exploited to bring context to the recognition stage and to avoid segmentation. The recognition is based on detecting bigram co-occurrences. Even with naive features and a limited reference set, it recognizes over 45% of lexicon words that it has never seen before in handwritten form.

