Results 1 
2 of
2
Enhancements to TransformationBased Speaker Adaptation: Principal Component and InterClass Maximum Likelihood Linear Regression
, 2000
"... iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve maximum likelihood linear regression. The first one is called principal component MLLR (PCMLLR), and it reduces the variance of the estimate of the MLLR matrix using principal component analysis. The second one is called interclass MLLR, and it utilizes relationships among different transformation functions to achieve more reliable estimates of MLLR parameters across multiple classes. The main idea of PCMLLR is that if we estimate the MLLR matrix in the eigendomain, the variances of the components of the estimates are inversely proportional to their eigenvalues. Therefore we can select more reliable components to reduce the variances of the resulting estimates and to improve speech recognition accuracy. PCMLLR eliminates highly variable components and chooses the principal components corresponding to the largest eigenvalues. If all the component are used, PCMLLR becomes the same as conventional MLLR. Choosing fewer principal components increases the bias of the estimates which can reduce recognition accuracy. To compensate for this problem, we developed weighted principal component MLLR (WPCMLLR). Instead of eliminating some of the components, all the components in WPCMLLR are used after applying weights that minimize the mean square error. The component corresponding to a larger eigenvalue has a larger weight than the component corresponding to a smaller eigenvalue. As more adaptation data become available, the benefits from these methods may become smaller because ...
A Gaussian Mixture Model Spectral Representation for Speech Recognition
, 2003
"... Most modern speech recognition systems use either Melfrequency cepstral coefficients or perceptual linear prediction as acoustic features. Recently, there has been some interest in alternative speech parameterisations based on using formant features. Formants are the resonant frequencies in the voc ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Most modern speech recognition systems use either Melfrequency cepstral coefficients or perceptual linear prediction as acoustic features. Recently, there has been some interest in alternative speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the characteristic shape of the speech spectrum. However, formants are difficult to reliably and robustly estimate from the speech signal and in some cases may not be clearly present. Rather than estimating the resonant frequencies, formantlike features can be used instead. Formantlike features use the characteristics of the spectral peaks to represent the spectrum. In this work, novel features are developed based on estimating a Gaussian mixture model (GMM) from the speech spectrum. This approach has previously been used sucessfully as a speech codec. The EM algorithm is used to estimate the parameters of the GMM. The extracted parameters: the means, standard deviations and component weights can be related to the formant locations, bandwidths and magnitudes. As the features directly represent the linear spectrum, it is possibly to apply techniques for vocal tract length normalisation and additive noise