Results 1 -
4 of
4
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
- IEEE Transactions on Speech and Audio Processing
, 1995
"... A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. P ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both ...
Audio-Visual Synchronization and Fusion using Canonical Correlation Analysis
"... Abstract — It is well-known that early integration (also called data fusion) is effective when the modalities are correlated, and late integration (also called decision or opinion fusion) is optimal when modalities are uncorrelated. In this paper, we propose a new multimodal fusion strategy for open ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract — It is well-known that early integration (also called data fusion) is effective when the modalities are correlated, and late integration (also called decision or opinion fusion) is optimal when modalities are uncorrelated. In this paper, we propose a new multimodal fusion strategy for open-set speaker identification using a combination of early and late integration following canonical correlation analysis (CCA) of speech and lip texture features. We also propose a method for high precision synchronization of the speech and lip features using CCA prior to the proposed fusion. Experimental results show that i) the proposed fusion strategy yields the best equal error rates (EER), which are used to quantify the performance of the fusion strategy for open-set speaker identification, and ii) precise synchronization prior to fusion improves the EER; hence, the best EER is obtained when the proposed synchronization scheme is employed together with the proposed fusion strategy. We note that the proposed fusion strategy outperforms others because the features used in the late integration are truly uncorrelated, since they are output of the CCA analysis. I.
Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression.
, 1996
"... The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maxim ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maximum Likelihood Linear Regression (MLLR) method. MLLR is a method that transforms mixture components of HMM's by multiplying the mean vectors with a transformation matrix. It introduces the concept of regression classes as a set of mixture components that are transformed similarly. The adaptation technique is implemented in C. The data used in the tests are taken from the Danish EUROM. 1 database. All results are averaged over ten speakers. Three issues have been addressed: 1) the effect of varying the amount of adaptation material, 2) the effect of using different regression class divisions and 3) the importance of the phonetic content in the adaptation material. Tests show that the MLLR tech...

