Results 1  10
of
75
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Maximum Likelihood Linear Transformations for HMMBased Speech Recognition
 Computer Speech and Language
, 1998
"... This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias ..."
Abstract

Cited by 406 (56 self)
 Add to MetaCart
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear featurespace transformations are inappropriate in this case. Hence, only modelbased linear transforms are considered. The paper compares the two possible forms of modelbased transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform (sometimes referred to as featurespace transforms). Reestimation formulae for all appropriate cases of transform are given. This includes a new and efficient "full" variance transform and the extension of the constrained modelspace transform from the simple diagonal case to the full or blockdiagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two modelspace transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained modelspace transform for speaker adaptive training are detailed. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 1
Mean and Variance Adaptation within the MLLR Framework
 Computer Speech & Language
, 1996
"... One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker specific data, and are often based on initi ..."
Abstract

Cited by 109 (15 self)
 Add to MetaCart
One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker specific data, and are often based on initial speaker independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case a SI recognition system trained in, typically, a clean acoustic environment is adapted to operate in a new, noisecorrupted, acoustic environment. This paper examines the Maximum Likelihood Linear Regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of models parameters to maximise the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and reestimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over meanonly MLLR adaptation. 1
Cluster Adaptive Training Of Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt ..."
Abstract

Cited by 57 (15 self)
 Add to MetaCart
When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple reestimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speakerindependent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speakerindependent model set. 2 1
Pronunciation Modeling By Sharing Gaussian Densities Across Phonetic Models
 Computer Speech and Language
, 1999
"... Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decisiontrees to generate alternate word pro ..."
Abstract

Cited by 54 (3 self)
 Add to MetaCart
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decisiontrees to generate alternate word pronunciations from phonemic baseforms. Use of such pronunciation models during recognition is known to improve accuracy. This paper describes the use of such pronunciation models during acoustic model training. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Analysis of this paradox leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones predicted by the pronunciation model, the HMM states of the phoneme's model are instead allowed to share Gaussian mixture components with the HMM states of the model of the alternate realization. Qualitatively, this amounts to making a soft decision about which surfaceform is realized. Quantitative experiments on the Switchboard corpus show that this method improves accuracy by 1.7% (absolute).
Speaker Adaptation Using Combined Transformation and Bayesian Methods
, 1994
"... Adapting the parameters of a statistical speakerindependent continuousspeech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixturedensity hidden Markov models the number of component densities is typical ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
Adapting the parameters of a statistical speakerindependent continuousspeech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixturedensity hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximumlikelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the largevocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speakerindependent accuracy achieved for native speakers.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Discriminative speaker adaptation with conditional maximum likelihood linear regression
 In Eurospeech
, 2001
"... We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRty ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRtype speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary. 1.
Use of Speech Recognition in Computerassisted Language Learning
, 1999
"... inear Model Combination and Model Merging. These algorithms are based on the assumption that the mothertongue of a nonnative speaker is known. The basic idea underlying most ndings of this thesis is that nonnative speech can be modeled with a mixture of sounds of a speaker's native language and t ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
inear Model Combination and Model Merging. These algorithms are based on the assumption that the mothertongue of a nonnative speaker is known. The basic idea underlying most ndings of this thesis is that nonnative speech can be modeled with a mixture of sounds of a speaker's native language and the target language. The newly developed speaker adaptation algorithms combine the acoustic models of the source and target language of a nonnative speaker. The algorithms only dier with regard to the details how the model sets are combined. A database of nonnative English was recorded for the purpose of testing these adaptation algorithms. This database mostly consists of utterances of Japanese and LatinAmerican Spanish accented English. The recordings were transcribed by trained phoneticians to obtain transcriptions corresponding to the actual phoneme sequence uttered by the student as opposed to canonical transcriptions obtained ii iii from a standard
Discounted likelihood linear regression for rapid speaker adaptation
 Comp. Spch. & Lang
, 2001
"... Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on NAB and Switchboard adaptation tasks. It is shown how this algorithm approximately optimizes a discounted likelihood criterion. 1.