Results 1 -
9 of
9
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition
- Computer Speech and Language
, 1998
"... This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias ..."
Abstract
-
Cited by 275 (44 self)
- Add to MetaCart
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-based linear transforms are considered. The paper compares the two possible forms of model-based transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform (sometimes referred to as feature-space transforms). Re-estimation formulae for all appropriate cases of transform are given. This includes a new and efficient "full" variance transform and the extension of the constrained model-space transform from the simple diagonal case to the full or block-diagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two model-space transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained model-space transform for speaker adaptive training are detailed. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 1
A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1996
"... is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performanc ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performance of automatic speech recognition (ASR) systems in adverse environments. When there is a mismatch between the training and testing environments, ASR systems suffer a degradation in performance. The goal of robust speech recognition is to remove the effect of this mismatch so as to bring the recognition performance as close as possible to the matched conditions. In speech recognition, the speech is usually modeled by a set of hidden Markov models (HMM) X . During recognition the observed utterance Y is decoded using these models. Due to the mismatch between training and testing conditions, this often results in a degradation in performance compared to the matched conditions. The mismatch b...
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
- Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
A survey on automatic speech recognition with an illustrative example on continuous speech recognition
- of Mandarin,” Computat. Linguistics Chinese Language Processing
, 1996
"... For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. In this paper we review some of the key advances in several areas of automatic speech recognition. We also illustrate, by examples, how these key advances can be used for continuous speech recognition of Mandarin. Finally we elaborate the requirements in designing successful real-world applications and address technical challenges that need to be harnessed in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.
Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression.
, 1996
"... The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maxim ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The work presented in this report focuses on an essential problem when doing speaker adaptation; namely how effectively the speaker specific information in the adaptation data is used. In the project a system has been implemented for speaker adaptation of hidden Markov models (HMM's) using the Maximum Likelihood Linear Regression (MLLR) method. MLLR is a method that transforms mixture components of HMM's by multiplying the mean vectors with a transformation matrix. It introduces the concept of regression classes as a set of mixture components that are transformed similarly. The adaptation technique is implemented in C. The data used in the tests are taken from the Danish EUROM. 1 database. All results are averaged over ten speakers. Three issues have been addressed: 1) the effect of varying the amount of adaptation material, 2) the effect of using different regression class divisions and 3) the importance of the phonetic content in the adaptation material. Tests show that the MLLR tech...
A Fast Algorithm For Unsupervised Incremental Speaker Adaptation
, 1997
"... Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Baye ..."
Abstract
- Add to MetaCart
Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Bayesian codebook reestimation which does not need the computationally intensive evaluation of normal densities and thus speeds up the adaptation remarkably, e.g. by a factor of 18 for 24--dimensional feature vectors. We performed experiments in two real--time applications with very small amounts of adaptation data, and achieved a word error reduction of up to 11%. 1 INTRODUCTION Speaker adaptation has been a field of intensive research for several years. Great progress has been made in the development of theoretically well--founded algorithms as well as in the achieved experimental results. Approaches based on optimality criteria such as Maximum Likelihood (ML) and Maximum a posteriori (MAP) h...
A Fast Algorithm For Unsupervised Incremental Speaker Adaptation
"... Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Baye ..."
Abstract
- Add to MetaCart
Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Bayesian codebook reestimation which does not need the computationally intensive evaluation of normal densities and thus speeds up the adaptation remarkably, e.g. by a factor of 18 for 24--dimensional feature vectors. We performed experiments in two real--time applications with very small amounts of adaptation data, and achieved a word error reduction of up to 11%. 1 INTRODUCTION Speaker adaptation has been a field of intensive research for several years. Great progress has been made in the development of theoretically well--founded algorithms as well as in the achieved experimental results. Approaches based on optimality criteria such as Maximum Likelihood (ML) and Maximum a posteriori (MAP) h...
ITERATIVE SPEAKER ADAPTATION USING MLLR
"... Speech recognition systems are usually speaker-independent, but they are not as good as speaker-dependent systems for specific speakers. An initial speaker-independent system can be adapted to improve recognition accuracy by transforming it into a speaker-dependent system. In this work, a new genera ..."
Abstract
- Add to MetaCart
Speech recognition systems are usually speaker-independent, but they are not as good as speaker-dependent systems for specific speakers. An initial speaker-independent system can be adapted to improve recognition accuracy by transforming it into a speaker-dependent system. In this work, a new general acoustic model adaptation technology is presented, using the MLLR algorithm iteratively in a supervised manner. Experiments have been performed on the TT2 Spanish speech corpus. The initial acoustic models were trained from the Albayzin speech database. Their results, which were obtained for 10 speakers, show an improvement in speech recognition accuracy. 1.

