Results 1  10
of
41
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Maximum Likelihood Linear Transformations for HMMBased Speech Recognition
 Computer Speech and Language
, 1998
"... This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias ..."
Abstract

Cited by 406 (56 self)
 Add to MetaCart
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear featurespace transformations are inappropriate in this case. Hence, only modelbased linear transforms are considered. The paper compares the two possible forms of modelbased transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform (sometimes referred to as featurespace transforms). Reestimation formulae for all appropriate cases of transform are given. This includes a new and efficient "full" variance transform and the extension of the constrained modelspace transform from the simple diagonal case to the full or blockdiagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two modelspace transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained modelspace transform for speaker adaptive training are detailed. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 1
SemiTied Covariance Matrices For Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all ..."
Abstract

Cited by 181 (27 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few \full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own \diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique ts within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary speechrecognition task. In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time. 2 1
Mean and Variance Adaptation within the MLLR Framework
 Computer Speech & Language
, 1996
"... One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker specific data, and are often based on initi ..."
Abstract

Cited by 109 (15 self)
 Add to MetaCart
One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker specific data, and are often based on initial speaker independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case a SI recognition system trained in, typically, a clean acoustic environment is adapted to operate in a new, noisecorrupted, acoustic environment. This paper examines the Maximum Likelihood Linear Regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of models parameters to maximise the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and reestimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over meanonly MLLR adaptation. 1
The LIMSI Broadcast News Transcription System
 Speech Communication
, 2002
"... This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to realworld or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. T ..."
Abstract

Cited by 102 (8 self)
 Add to MetaCart
This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to realworld or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. Two main problems needed to be addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers).
Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression
 Proc. ARPA Spoken Language Technology Workshop
, 1995
"... The maximum likelihood linear regression (MLLR) approach for speaker adaptation of continuous density mixture Gaussian HMMs is presented and its application to static and incremental adaptation for both supervised and unsupervised modes described. The approach involves computing a transformation for ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
The maximum likelihood linear regression (MLLR) approach for speaker adaptation of continuous density mixture Gaussian HMMs is presented and its application to static and incremental adaptation for both supervised and unsupervised modes described. The approach involves computing a transformation for the mixture component means using linear regression. To allow adaptation to be performed with limited amounts of data, a small number of transformations are defined and each one is tied to a number of component mixtures. In previous work, the tyings were predetermined based on the amount of available data. Recently we have used dynamic regression class generation which chooses the appropriate number of classes and transform tying during the adaptation phase. This allows complete unsupervised operation with arbitrary adaptation data. Results are given for static supervised adaptation for nonnative speakers and also unsupervised incremental adaptation. Both show the effectiveness and flexibi...
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
The Development Of The 1994 HTK Large Vocabulary Speech Recognition System
"... This paper describes recent developments of the HTK large vocabulary continuous speech recognition system. The system uses tiedstate crossword contextdependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
This paper describes recent developments of the HTK large vocabulary continuous speech recognition system. The system uses tiedstate crossword contextdependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to produce word lattices to allow flexible and efficient system development, as well as multipass operation for use with computationally expensive acoustic and/or language models. The system vocabulary can now be up to 65k words, the final acoustic models have been extended to be sensitive to more acoustic context (quinphones), a 4gram language model has been used and unsupervised incremental speaker adaptation incorporated. The resulting system gave the lowest error rates on both the H1P0 and H1C1 hub tasks in the November 1994 ARPA CSR evaluation. 1. INTRODUCTION This paper describes recent improvements to the HTK large vocabulary speech recognition system. The system uses stateclustere...
The Generation And Use Of Regression Class Trees For Mllr Adaptation
, 1996
"... Maximum likelihood linear regression (MLLR) is an adaptation technique suitable for both speaker and environmental modelbased adaptation. The models are adapted using a set of linear transformations, estimated in a maximum likelihood fashion from the available adaptation data. As these transformati ..."
Abstract

Cited by 62 (8 self)
 Add to MetaCart
Maximum likelihood linear regression (MLLR) is an adaptation technique suitable for both speaker and environmental modelbased adaptation. The models are adapted using a set of linear transformations, estimated in a maximum likelihood fashion from the available adaptation data. As these transformations can capture general relationships between the original model set and the current speaker, or new acoustic environment, they can be effective in adapting all the HMM distributions with limited adaptation data. Two important decisions that must be made are (i) how to cluster components together, such that they all have a similar transformation matrix, and (ii) how many transformation matrices to generate for a given block of adaptation data. This paper addresses both problems. Firstly it describes two optimal clustering techniques, in the sense of maximising the likelihood of the adaptation data. The first assigns each component to one of the regression classes. This may be used to generat...
Errorresponsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's stateoftheart recognizers are not designed to be situated naturally in an error feedback loop, they are illpositioned for inclusion in multimodal ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's stateoftheart recognizers are not designed to be situated naturally in an error feedback loop, they are illpositioned for inclusion in multimodal interfaces, multimedia databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \metalevel " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.