Results 11 -
14 of
14
Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression.
, 1996
"... The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maxim ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maximum Likelihood Linear Regression (MLLR) method. MLLR is a method that transforms mixture components of HMM's by multiplying the mean vectors with a transformation matrix. It introduces the concept of regression classes as a set of mixture components that are transformed similarly. The adaptation technique is implemented in C. The data used in the tests are taken from the Danish EUROM. 1 database. All results are averaged over ten speakers. Three issues have been addressed: 1) the effect of varying the amount of adaptation material, 2) the effect of using different regression class divisions and 3) the importance of the phonetic content in the adaptation material. Tests show that the MLLR tech...
COMPUTER
, 1998
"... Hidden Markov Models (HMMs) have been used with considerable success in continuous speech recognition. It is well known that high accuracy can be obtained when the HMM system is trained and tested in a quiet environment and the speech signal is acquired from a close-talking microphone. However, mism ..."
Abstract
- Add to MetaCart
Hidden Markov Models (HMMs) have been used with considerable success in continuous speech recognition. It is well known that high accuracy can be obtained when the HMM system is trained and tested in a quiet environment and the speech signal is acquired from a close-talking microphone. However, mismatches between training and testing environment severely degrade erformance. Two major sources of mismatches are speaker and environment variability. Speaker variation is typically caused by di erent speaking styles and other physiological di erences between speakers, such asvocal tract lengths, etc. Environment variability includes channel distortion, such as that which a ects telephone speech, additive noise, and reverberation which results when the microphone is far away from the speaker. The goal of this report is to explore di erent adaptation algorithms that mitigate the e ects of speaker and environmental variability for speech recognition. The adaptation algorithm closely examined in this report is a Linguistic Tree based Maximum Likelihood Linear Regression (LT-MLLR). Speech Recognition experiments using the LT-MLLR for speaker and environment adaptation are given. It is shown that the LT-MLLR algorithm is superior to other adaptation algorithms discussed. For speaker adaptation, a 30% reduction is achieved over the baseline word error rate (WER) using this algorithm. In addition it is shown that the use of Matched Filter Array Processing (MFA) with LT-MLLR reduces the WER of distant-talking speech with high reverberation. In the case when the reverberation time is as high as 0.9s, the WER is reduced from 57.89 % to 19.41%, a reduction of 66.47%. [This work was supported by DARPA Contract DABT63-93-C-0037.] ii
Production knowledge in the recognition of dysarthric speech
, 2011
"... Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, d ..."
Abstract
- Add to MetaCart
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic

