Results 11  20
of
28
The Stochastic Segment Model for Continuous Speech Recognition
 In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers
, 1991
"... A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint m ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, which provides explicit modeling of time correlation as well as a formalism for incorporating segmental features. In this work, the focus is on modeling time correlation within a segment. We consider three Gaussian model variations based on different assumptions about the form of statistical dependency, including a GaussMarkov model, a dynamical system model and a target state model, all of which can be formulated in terms of the dynamical system model. Evaluation of the different modeling assumptions is in terms of both phoneme classification performance and the predictive power of linear models. 1 Introduction Most of the existing speakerindependent ...
Discriminative speaker adaptation using articulatory features
 Speech Communication
, 2007
"... This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. pe er
Computations and Evaluations of an Optimal Featureset for an HMMbased Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog sp ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal featuresets for speech recognition. The objectives for an optimal featureset are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a featureset, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
Convexity, Maximum Likelihood and All That
, 1996
"... This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...
Topics In Computational Hidden State Modeling
, 1997
"... Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative f ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models. Finite Growth Models (FGM) are introduced. These are nonnegative functionals that arise from parametricallyweighted directed acyclic graphs and a tuple observation that affects these weights. Using FGMs the parameters of a highly general form of stochastic transducer can be learned from examples, and the particular case of stochastic string edit distance is developed. Experiments are described that illustrate the application of learned string edit distance to the problem of recognizing a spoken word given a phonetic transcription of the acoustic signal. With FGMs one may direct learning by criteria beyond simple maximumlikelihood. The MAP (maximum a posteriori estimate) and MDL (minimum description length) are discussed along with the application to cau...
A Comparison Of Hybrid HMM Architectures Using Global Discriminative Training
"... This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and outpu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and output states as the Gaussian mixtures. All free parameters in the three systems are jointly optimised using the same global discriminative criterion. A Forward decoder, with total likelihood scoring, is used for recognition. While the global discriminative training method is found to improve the baseline HMM significantly, the differences between Gaussian and MLPbased architectures are small. The Gaussian mixture system however performs slightly better at the lowest complexity levels.
On the Strange a Posteriori degeneracy of Normal Mixtures, and Related Reparameterization Theorems
, 1996
"... This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture' ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This short paper illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures. Thinking of each mixture component as a class, we focus on the corresponding a posteriori class probability functions. It is shown that the relationship between these functions and the mixture's parameters, is highly degenerate  and that the precise nature of this degeneracy leads to somewhat unusual and counterintuitive behavior. Even complete knowledge of a mixture's a posteriori class behavior, reveals essentially nothing of its absolute nature, i.e. mean locations and covariance norms. Consequently a mixture whose means are located in a small ball anywhere in space, can project arbitrary class structure everywhere in space. The wellknown expectation maximization (EM) algorithm for Maximum Likelihood (ML) optimization may be thought of as a reparameterization of the problem in which the search takes place over the space of sample point weights. Motivated by EM we characterize th...
Approximation of Stochastic Processes by Hidden Markov Models
, 1992
"... In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be dete ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this thesis we restrict ourselves to stationary and discrete valued stochastic processes. A pair of stochastic processes (X, Y) is a Hidden Markov Model (HMM) if X (the state process) is a Markov process and Y (the observable process) is an incomplete observation of X. The observation can be deterministic or noisy and the observable can be a state or a state transition. Hence we have four possible types of HMM’s. First we establish that all types of HMM’s are equivalent, in the sense that, given any HMM of arbitrary type we can construct a HMM of any other arbitrary type, such that the two models have identical observable processes. Therefore all types of HMM’s have the same modelling power. Second, we consider the problem of Representation: what kind of stochastic processes can we approximate with Hidden Markov Models? To make the question meaningful we define two types of stochastic process approximation: (a) weak approximation, based on the weak convergence of probability measures and (b) cross entropy approximation, based on the KullbackLeibler informational divergence. Then we prove that there is a sequence of HMM’s (of increasing size) that approximate any ergodic stochastic process in the weak and cross entropy sense. Third, we consider the problem of Consistent Estimation. To approximate an ergodic process we need a sequence of HMM’s of increasing size. For a fixed size Hidden Markov Model we can use the very efficient Baum algorithm to find the Maximum Likelihood parameters estimate. But will the sequence of estimates be consistent (i.e. will it converge to the true process)? The answer is: the sequence of Maximum Likelihood Estimates will be consistent if the original process is ergodic, has strictly positive probability and conditional probability bounded away from zero. Fourth, we develop HMM models of the raw speech signal and demonstrate numerically consistency of Maximum Likelihood estimation. Finally, we develop Hidden Gibbs Models, an analogue of HMM, and use these to model one dimensional speech signals and two dimensional images.
Learning in Sequential Pattern Recognition
"... [A unifying review for optimizationoriented speech recognition] ..."
(Show Context)
WHAT HMMS CAN’T DO
"... Hidden Markov models (HMMs) are the predominant methodology for automatic speech recognition (ASR) systems. Ever since their inception, it has been said that HMMs are an inadequate statistical model for such purposes. Results over the years have shown, however, that HMMbased ASR performance continu ..."
Abstract
 Add to MetaCart
(Show Context)
Hidden Markov models (HMMs) are the predominant methodology for automatic speech recognition (ASR) systems. Ever since their inception, it has been said that HMMs are an inadequate statistical model for such purposes. Results over the years have shown, however, that HMMbased ASR performance continually improves given enough training data and engineering effort. In this paper, we argue that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. In search of a model to supersede the HMM for ASR, therefore, we should search for models with better parsimony, computational properties, noise insensitivity, and that better utilize highlevel knowledge sources. 1.