Results 11  20
of
20
The Stochastic Segment Model for Continuous Speech Recognition
 In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers
, 1991
"... A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint m ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, which provides explicit modeling of time correlation as well as a formalism for incorporating segmental features. In this work, the focus is on modeling time correlation within a segment. We consider three Gaussian model variations based on different assumptions about the form of statistical dependency, including a GaussMarkov model, a dynamical system model and a target state model, all of which can be formulated in terms of the dynamical system model. Evaluation of the different modeling assumptions is in terms of both phoneme classification performance and the predictive power of linear models. 1 Introduction Most of the existing speakerindependent ...
Convexity, Maximum Likelihood and All That
, 1996
"... This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...
Computations and Evaluations of an Optimal Featureset for an HMMbased Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog sp ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal featuresets for speech recognition. The objectives for an optimal featureset are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a featureset, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
Topics In Computational Hidden State Modeling
, 1997
"... Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative f ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative functionals that arise from parametricallyweighted directed acyclic graphs and a tuple observation that affects these weights. Using FGMs the parameters of a highly general form of stochastic transducer can be learned from examples, and the particular case of stochastic string edit distance is developed. Experiments are described that illustrate the application of learned string edit distance to the problem of recognizing a spoken word given a phonetic transcription of the acoustic signal. With FGMs one may direct learning by criteria beyond simple maximumlikelihood. The MAP (maximum a posteriori estimate) and MDL (minimum description length) are discussed along with the application to cau...
A Comparison Of Hybrid HMM Architectures Using Global Discriminative Training
"... This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and outpu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and output states as the Gaussian mixtures. All free parameters in the three systems are jointly optimised using the same global discriminative criterion. A Forward decoder, with total likelihood scoring, is used for recognition. While the global discriminative training method is found to improve the baseline HMM significantly, the differences between Gaussian and MLPbased architectures are small. The Gaussian mixture system however performs slightly better at the lowest complexity levels.
On the Strange a Posteriori degeneracy of Normal Mixtures, and Related Reparameterization Theorems
, 1996
"... This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture's par ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture's parameters, is highly degenerate  and that the precise nature of this degeneracy leads to somewhat unusual and counterintuitive behavior. Even complete knowledge of a mixture's a posteriori class behavior, reveals essentially nothing of its absolute nature, i.e. mean locations and covariance norms. Consequently a mixture whose means are located in a small ball anywhere in space, can project arbitrary class structure everywhere in space. The wellknown expectation maximization (EM) algorithm for Maximum Likelihood (ML) optimization may be thought of as a reparameterization of the problem in which the search takes place over the space of sample point weights. Motivated by EM we characterize th...
Approximation of Stochastic Processes by Hidden Markov Models
, 1998
"... In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X; Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be dete ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X; Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be deterministic or noisy and the observable can be a state or a state transition. Hence we have four possible types of HMM's. First we establish that all types of HMM's are equivalent, in the sense that, given any HMM of arbitrary type we can construct a HMM of any other arbitrary type, such that the two models have identical observable processes. Therefore all types of HMM's have the same modelling power. Second, we consider the problem of Representation: what kind of stochastic processes can we approximate with Hidden Markov Models? To make the question meaningful we deflne two types of stochastic process approximation: (a) weak approximation, based on the weak convergence of probability measures and (b) cross entropy approximation, based on the KullbackLeibler informational divergence. Then we prove that there is a sequence of HMM's (of increasing size) that approximate any ergodic stochastic process in the weak and cross entropy sense. Third, we consider the problem of Consistent Estimation. To approximate an ergodic process we need a sequence of HMM's of increasing size. For a flxed size Hidden Markov Model we can use the very ecient Baum algorithm to flnd the Maximum Likelihood parameters estimate. But will the sequence of estimates be consistent (i.e. will it converge to the true process)? The answer is: the sequence of Maximum Likelihood Estimates will be consistent if the original process is ergodic, has strictly positive probability and conditional probability bounded away from zero. Fourth...
Models
, 1992
"... In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be dete ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be deterministic or noisy and the observable can be a state or a state transition. Hence we have four possible types of HMM’s. First we establish that all types of HMM’s are equivalent, in the sense that, given any HMM of arbitrary type we can construct a HMM of any other arbitrary type, such that the two models have identical observable processes. Therefore all types of HMM’s have the same modelling power. Second, we consider the problem of Representation: what kind of stochastic processes can we approximate with Hidden Markov Models? To make the question meaningful we define two types of stochastic process approximation: (a) weak approximation, based on the weak convergence of probability measures and (b) cross entropy approximation, based on the KullbackLeibler informational divergence. Then we prove that there is a sequence of HMM’s (of increasing size) that approximate any ergodic stochastic process in the weak and cross entropy sense. Third, we consider the problem of Consistent Estimation. To approximate an ergodic process we need a sequence of HMM’s of increasing size. For a fixed size Hidden Markov Model we can use the very efficient Baum algorithm to find the Maximum Likelihood parameters estimate. But will the sequence of estimates be consistent (i.e. will it converge to the true process)? The answer is: the sequence of Maximum Likelihood Estimates will be consistent if the original process is ergodic, has strictly positive probability and conditional probability bounded away from zero. Fourth, we develop HMM models of the raw speech signal and demonstrate numerically consistency of Maximum Likelihood estimation. Finally, we develop Hidden Gibbs Models, an analogue of HMM, and use these to model one dimensional speech signals and two dimensional images. Approximation of Stochastic Processes by Hidden Markov
Learning in Sequential Pattern Recognition
"... [A unifying review for optimizationoriented speech recognition] ..."
Robustness in ASR: An Experimental Study of the Interrelationship between Discriminant FeatureSpace Transformation, Speaker Normalization and Environment Compensation
, 2007
"... 2007/01/15 i This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, ..."
Abstract
 Add to MetaCart
2007/01/15 i This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant featurespace transformation (DFT). These interactions were quantified by performing experiments on the connected digit utterances comprising the Aurora 2 database, using continuous density hidden Markov models (HMM) representing individual digits. Firstly, given that the performance of speaker normalization techniques degrades in the presence of noise, it is shown that reducing the effects of noise through environmental compensation, prior to speaker normalization, leads to substantial improvements in ASR performance. The speaker normalization techniques considered here were vocal tract length normalization (VTLN) and the augmented statespace acoustic decoder (MATE). Secondly, given that discriminant featurespace transformation (DFT) are known to increase