Results 11 -
18 of
18
Convexity, Maximum Likelihood and All That
, 1996
"... This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...
Computations and Evaluations of an Optimal Feature-set for an HMM-based Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog sp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal feature-sets for speech recognition. The objectives for an optimal feature-set are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a feature-set, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
Topics In Computational Hidden State Modeling
, 1997
"... Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative f ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative functionals that arise from parametrically-weighted directed acyclic graphs and a tuple observation that affects these weights. Using FGMs the parameters of a highly general form of stochastic transducer can be learned from examples, and the particular case of stochastic string edit distance is developed. Experiments are described that illustrate the application of learned string edit distance to the problem of recognizing a spoken word given a phonetic transcription of the acoustic signal. With FGMs one may direct learning by criteria beyond simple maximum-likelihood. The MAP (maximum a posteriori estimate) and MDL (minimum description length) are discussed along with the application to cau...
On the Strange a Posteriori degeneracy of Normal Mixtures, and Related Reparameterization Theorems
, 1996
"... This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture's par ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture's parameters, is highly degenerate -- and that the precise nature of this degeneracy leads to somewhat unusual and counter-intuitive behavior. Even complete knowledge of a mixture's a posteriori class behavior, reveals essentially nothing of its absolute nature, i.e. mean locations and covariance norms. Consequently a mixture whose means are located in a small ball anywhere in space, can project arbitrary class structure everywhere in space. The well-known expectation maximization (EM) algorithm for Maximum Likelihood (ML) optimization may be thought of as a reparameterization of the problem in which the search takes place over the space of sample point weights. Motivated by EM we characterize th...
Approximation of Stochastic Processes by Hidden Markov Models
, 1998
"... In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X; Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be dete ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X; Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be deterministic or noisy and the observable can be a state or a state transition. Hence we have four possible types of HMM's. First we establish that all types of HMM's are equivalent, in the sense that, given any HMM of arbitrary type we can construct a HMM of any other arbitrary type, such that the two models have identical observable processes. Therefore all types of HMM's have the same modelling power. Second, we consider the problem of Representation: what kind of stochastic processes can we approximate with Hidden Markov Models? To make the question meaningful we deflne two types of stochastic process approximation: (a) weak approximation, based on the weak convergence of probability measures and (b) cross entropy approximation, based on the Kullback-Leibler informational divergence. Then we prove that there is a sequence of HMM's (of increasing size) that approximate any ergodic stochastic process in the weak and cross entropy sense. Third, we consider the problem of Consistent Estimation. To approximate an ergodic process we need a sequence of HMM's of increasing size. For a flxed size Hidden Markov Model we can use the very e--cient Baum algorithm to flnd the Maximum Likelihood parameters estimate. But will the sequence of estimates be consistent (i.e. will it converge to the true process)? The answer is: the sequence of Maximum Likelihood Estimates will be consistent if the original process is ergodic, has strictly positive probability and conditional probability bounded away from zero. Fourth...
A Comparison Of Hybrid HMM Architectures Using Global Discriminative Training
"... This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and outpu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and output states as the Gaussian mixtures. All free parameters in the three systems are jointly optimised using the same global discriminative criterion. A Forward decoder, with total likelihood scoring, is used for recognition. While the global discriminative training method is found to improve the baseline HMM significantly, the differences between Gaussian and MLP-based architectures are small. The Gaussian mixture system however performs slightly better at the lowest complexity levels.
Models
, 1992
"... In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be dete ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be deterministic or noisy and the observable can be a state or a state transition. Hence we have four possible types of HMM’s. First we establish that all types of HMM’s are equivalent, in the sense that, given any HMM of arbitrary type we can construct a HMM of any other arbitrary type, such that the two models have identical observable processes. Therefore all types of HMM’s have the same modelling power. Second, we consider the problem of Representation: what kind of stochastic processes can we approximate with Hidden Markov Models? To make the question meaningful we define two types of stochastic process approximation: (a) weak approximation, based on the weak convergence of probability measures and (b) cross entropy approximation, based on the Kullback-Leibler informational divergence. Then we prove that there is a sequence of HMM’s (of increasing size) that approximate any ergodic stochastic process in the weak and cross entropy sense. Third, we consider the problem of Consistent Estimation. To approximate an ergodic process we need a sequence of HMM’s of increasing size. For a fixed size Hidden Markov Model we can use the very efficient Baum algorithm to find the Maximum Likelihood parameters estimate. But will the sequence of estimates be consistent (i.e. will it converge to the true process)? The answer is: the sequence of Maximum Likelihood Estimates will be consistent if the original process is ergodic, has strictly positive probability and conditional probability bounded away from zero. Fourth, we develop HMM models of the raw speech signal and demonstrate numerically consistency of Maximum Likelihood estimation. Finally, we develop Hidden Gibbs Models, an analogue of HMM, and use these to model one dimensional speech signals and two dimensional images. Approximation of Stochastic Processes by Hidden Markov
Learning in Sequential Pattern Recognition
"... [A unifying review for optimization-oriented speech recognition] ..."

