Results 1  10
of
55
Maximum Likelihood Linear Transformations for HMMBased Speech Recognition
 Computer Speech and Language
, 1998
"... This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias ..."
Abstract

Cited by 406 (56 self)
 Add to MetaCart
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMMbased speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear featurespace transformations are inappropriate in this case. Hence, only modelbased linear transforms are considered. The paper compares the two possible forms of modelbased transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform (sometimes referred to as featurespace transforms). Reestimation formulae for all appropriate cases of transform are given. This includes a new and efficient "full" variance transform and the extension of the constrained modelspace transform from the simple diagonal case to the full or blockdiagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two modelspace transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained modelspace transform for speaker adaptive training are detailed. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 1
SemiTied Covariance Matrices For Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all ..."
Abstract

Cited by 181 (27 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few \full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own \diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique ts within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary speechrecognition task. In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time. 2 1
Cluster Adaptive Training Of Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt ..."
Abstract

Cited by 57 (15 self)
 Add to MetaCart
When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple reestimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speakerindependent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speakerindependent model set. 2 1
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
A comparative study of adaptation methods for speaker verification
 in ICSLP, 2002
"... Reallife speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in sp ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
Reallife speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in speaker verification, other methods have proven to be successful in related domains such as speech recognition. This paper reports on experimental comparison between three wellknown adaptation methods, namely MAP, Maximum Likelihood Linear Regression, and finally EigenVoices. All three methods are compared to the more classical Maximum Likelihood method, and results are given for a subset of the 1999 NIST Speaker Recognition Evaluation database. 1.
Fast Speaker Adaptation Using EigenspaceBased Maximum Likelihood Linear Regression
, 2000
"... This paper presents an eigenspacebased fast speaker adaptation approach which can improve the modeling accuracy of the conventional maximum likelihood linear regression (MLLR) techniques when only very limited adaptation data is available. The proposed eigenspacebased MLLR approach was developed b ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper presents an eigenspacebased fast speaker adaptation approach which can improve the modeling accuracy of the conventional maximum likelihood linear regression (MLLR) techniques when only very limited adaptation data is available. The proposed eigenspacebased MLLR approach was developed by introducing a priori knowledge analysis on the training speakers via PCA, so as to construct an eigenspace for MLLR full regression matrices as well as to derive a set of bases called eigenmatrices. The full regression matrices for each outside speaker are then constrained to be located in the space spanned by the first K eigenmatrices. The proposed eigenspacebased regression matrices, serving as an initial estimate of the speakerspecific MLLR transformation, effectively reduces the number of free parameters, while precise modeling for the interdimensional correlation among the model parameters by full matrices was maintained. Experimental results showed that for supervised adaptation...
Recent innovations in speechtotext transcription at sriicsiuw
 IEEE Transactions on Audio, Speech & Language Processing
, 2006
"... Abstract — We summarize recent progress in automatic speechtotext ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Abstract — We summarize recent progress in automatic speechtotext
Linear Gaussian models for speech recognition
 CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo
SemiTied FullCovariance Matrices For Hidden Markov Models
, 1997
"... There is normally a simple choice made in the form of the covariance matrix to be used with HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the corr ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few "full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own "diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique fits within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary...
Precision matrix modelling for large vocabulary continuous speech recognition
, 2004
"... Recently, structured precision matrix models were found to outperform the conventional diagonal covariance matrix models. Minimum phone error discriminative training of these models gave very good unadapted performance on large vocabulary continuous speech recognition systems. To obtain stateofthe ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Recently, structured precision matrix models were found to outperform the conventional diagonal covariance matrix models. Minimum phone error discriminative training of these models gave very good unadapted performance on large vocabulary continuous speech recognition systems. To obtain stateoftheart performance, it is important to apply adaptation techniques efficiently to these models. In this paper, simple rowbyrow iterative formulae are described for both MLLR mean and constrained MLLR transform estimations of these models. These update formulae are derived within the standard expectation maximisation framework and are guaranteed to increase the likelihood of the adaptation data. Efficient approximate schemes for these adaptation methods are also investigated to further reduce the computation. Experimental results are presented based on the MPE trained Subspace for Precision and Mean models, evaluated on both broadcast news and conversational telephone speech English tasks. 1.