Results 1 - 10
of
15
Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
- IEEE Transactions on Speech and Audio Processing
, 1994
"... In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addr ..."
Abstract
-
Cited by 372 (36 self)
- Add to MetaCart
In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addressed. Using HMMs with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normal-Wishart densities. The classical maximum likelihood estimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications: parameter smoothing and model adaptation, and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications
A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1996
"... is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performanc ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performance of automatic speech recognition (ASR) systems in adverse environments. When there is a mismatch between the training and testing environments, ASR systems suffer a degradation in performance. The goal of robust speech recognition is to remove the effect of this mismatch so as to bring the recognition performance as close as possible to the matched conditions. In speech recognition, the speech is usually modeled by a set of hidden Markov models (HMM) X . During recognition the observed utterance Y is decoded using these models. Due to the mismatch between training and testing conditions, this often results in a degradation in performance compared to the matched conditions. The mismatch b...
Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
- IEEE Transactions on Speech and Audio Processing
, 1995
"... A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. P ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both ...
Speaker Adaptation Using Combined Transformation and Bayesian Methods
, 1994
"... Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typical ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Adapting the parameters of a statistical speaker-independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
- Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Choice of Basis for Laplace Approximation
- Machine Learning
, 1998
"... Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basis-dependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the prob ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basis-dependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the softmax' basis.
Training Data Clustering For Improved Speech Recognition
- in Proceedings of EUROSPEECH
, 1995
"... We present an approach to cluster the training data for automatic speech recognition (ASR). A relativeentropy based distance metric between training data clusters is defined. This metric is used to hierarchically cluster the training data. The metric can also be used to select the closest training d ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We present an approach to cluster the training data for automatic speech recognition (ASR). A relativeentropy based distance metric between training data clusters is defined. This metric is used to hierarchically cluster the training data. The metric can also be used to select the closest training data clusters given a small amount of data from the test speaker. The selected clusters are then used to estimate a set of hidden Markov models (HMMs) for recognizing the speech from the test speaker. We present preliminary experimental results of the clustering algorithm and its application to ASR. 1 Introduction While progress in ASR has been encouraging, it has become increasingly clear that ASR systems must perform well in the presence of mismatches between the training and testing environments. ASR systems trained in one environment often perform poorly in a new environment due to mismatches between the training and testing conditions. Common sources of mismatches include different tran...
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
Writer adaptation techniques in HMM based Off-Line Cursive Script Recognition
- PATTERN RECOGNITION LETTERS
, 2002
"... This work presents the application of HMM adaptation techniques to the problem of Off-Line Cursive Script Recognition. Instead of training a new model for each writer, one first creates a unique model with a mixed database and then adapts it for each different writer using his own small dataset. Exp ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This work presents the application of HMM adaptation techniques to the problem of Off-Line Cursive Script Recognition. Instead of training a new model for each writer, one first creates a unique model with a mixed database and then adapts it for each different writer using his own small dataset. Experiments on a publicly available benchmark database show that an adapted system has an accuracy higher than 80% even when less than 30 word samples are used during adaptation, while a system trained using the data of the single writer only needs at least 200 words in order to achieve the same performance as the adapted models.
The Use of Speaker Correlation Information for Automatic Speech Recognition
, 1998
"... This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker in ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker independent systems, as can seen by the severe drop in performance exhibited by systems between their speaker dependent mode and their speaker independent mode. The typical solution to this problem is to apply speaker adaptation to the models of the speaker independent system. This approach is examined in this thesis with the explicit goal of improving the rapid adaptation capabilities of the system by incorporating within-speaker correlation information into the adaptation process. This is achieved through the creation of an adaptation technique called referencespeaker weighting and in the development of a speaker clustering technique called speaker cluster weighting. However, speaker adaptation is just one way in which the independence assumption can be attacked. This dissertation also introduces a novel speech recognition technique called consistency modeling. This technique utilizes a priori knowledge about the within-speaker correlations which exist between di#erent phonetic events for the purpose of incorporating speaker constraintinto a speech recognition system without explicitly applying speaker adaptation. These new techniques are implemented within a segment-based speech recognition system and evaluation results are reported on the DARPA Resource Management recognition task.

