Results 1  10
of
51
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
 IEEE Transactions on Speech and Audio Processing
, 1994
"... In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addr ..."
Abstract

Cited by 491 (39 self)
 Add to MetaCart
In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addressed. Using HMMs with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normalWishart densities. The classical maximum likelihood estimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications: parameter smoothing and model adaptation, and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications
2000. Rapid speaker adaptation in eigenvoice space
 IEEE Transations on Speech and Audio Processing 8
"... Abstract—This paper describes a new modelbased speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number ..."
Abstract

Cited by 95 (8 self)
 Add to MetaCart
Abstract—This paper describes a new modelbased speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These “eigenvoice ” basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a smallvocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26 % relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work. Index Terms—Eigenvoice approach, principal component analysis, speaker adaptation, speaker clustering. I.
Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
 IEEE Transactions on Speech and Audio Processing
, 1995
"... A recent trend in automatic speech recognition systems is the use of continuous mixturedensity hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. P ..."
Abstract

Cited by 90 (2 self)
 Add to MetaCart
A recent trend in automatic speech recognition systems is the use of continuous mixturedensity hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixturedensity HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximumlikelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the largevocabulary Wall Street Journal corpus for both ...
Speaker Adaptation Using Combined Transformation and Bayesian Methods
, 1994
"... Adapting the parameters of a statistical speakerindependent continuousspeech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixturedensity hidden Markov models the number of component densities is typical ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
Adapting the parameters of a statistical speakerindependent continuousspeech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixturedensity hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximumlikelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the largevocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speakerindependent accuracy achieved for native speakers.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition
"... In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semicontinuous one with Gaussian mixture state observation densities is presented. Corresponding to the wellknown BaumWelch and segmental kmeans algorithms respectively for HMM training, formulations of MAP ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semicontinuous one with Gaussian mixture state observation densities is presented. Corresponding to the wellknown BaumWelch and segmental kmeans algorithms respectively for HMM training, formulations of MAP (maximum aposteriori) and segmental MAP estimation of HMM parameters are developed. Furthermore, a computationally efficient method of the segmental quasiBayes estimation for semicontinuous HMM is also presented. The important issue of prior density estimation is discussed and a simplified method of moment estimate is given. The method proposed in this paper will be applicable to some problems in HMM training for speech recognition such as sequential or batch training, model adaptation, and parameter smoothing, etc.
MAP Estimation of Continuous Density HMM: Theory and Applications
 In: Proceedings of DARPA Speech and Natural Language Workshop
, 1992
"... We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...
OnLine Adaptive Learning Of The Correlated Continuous Density Hidden Markov Models For Speech Recognition
 IEEE Trans. on Speech and Audio Processing
"... We extend our previously proposed quasiBayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive app ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We extend our previously proposed quasiBayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive approximation algorithm is proposed to implement the correlated mean vectors' updating. As an example, by applying the method to online speaker adaptation application, the algorithm is experimentally shown to be asymptotic convergent as well as being able to enhance the efficiency and the effectiveness of the Bayes learning by taking into account the correlation information between different models. The technique can be used to cope with the timevarying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, transducers, environments and so on.
Voice Characteristics Conversion For HmmBased Speech Synthesis System
, 1997
"... In this paper, we describe an approachtovoice characteristics conversion for an HMMbased texttospeech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved bychanging HMM parameters appropriately. To transform the voic ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
In this paper, we describe an approachtovoice characteristics conversion for an HMMbased texttospeech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved bychanging HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied MAP/VFS algorithm to the phoneme HMMs. Using 5 or 8 sentences as adaptation data, speech samples synthesized from a set of adapted tied triphone HMMs, whichhave approximately 2,000 distributions, are judged to be closer to the target speaker by 79.7% or 90.6%, respectively, in an ABX listening test.