Results 1  10
of
11
F.A.: Predicting Unseen Triphones with Senones
 IEEE Transaction on Speech and Audio Processing
, 1996
"... ..."
(Show Context)
Bayesian Learning for Hidden Markov Model with Gaussian Mixture State Observation Densities
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker ..."
Abstract

Cited by 41 (15 self)
 Add to MetaCart
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMMbased speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering, and corrective training are given.
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
 Proc. DARPA Speech and Natural Language Workshop
, 1991
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker cl ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
(Show Context)
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMMbased speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given. Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speakerspecific training data, a 31% reduction in word error r...
MAP Estimation of Continuous Density HMM: Theory and Applications
 In: Proceedings of DARPA Speech and Natural Language Workshop
, 1992
"... We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
(Show Context)
We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...
IMPLEMENTATION ASPECTS OF LARGE VOCABULARY RECOGNITION BASED ON INTRAWORD AND INTERWORD PHONETIC UNITS
, 1990
"... Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the trainin ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the recognition structure be efficient in terms of computation and memory, and accurate in terms of actually determining the best path through the lattice, so that a wide range of training (subword unit creation) strategies can be efficiently evaluated in a reasonable time period. We have considered an architecture in which we incorporate several well known procedures (beam search, compiled network, etc.) with some new ideas (stacks of active network nodes, likelihood computation on demand, guided search, etc.) to implement a search procedure which maintains the accuracy of the full search but which can decode a single sentence in about one minute of computing time (about 20 times real time) on a vectorized, concurrent processor. The ways in which we have realized this significant computational reduction are described in this paper.
Factorization Of Language Constraints In Speech Recognition
, 1991
"... Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizer ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizers. The recognized string is then decoded by means of an efficient language postprocessor in which the full set of constraints is imposed to correct possible errors introduced by the speech recognizer.
MAP Estimation of Continuous Density HMM: Theory and Applications
"... We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observat ..."
Abstract
 Add to MetaCart
(Show Context)
We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective ~aining. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach.
ABSTRACT A Study on SpeakerAdaptive Speech Recognition
"... Speakerindependent system is desirable in many applications where speakerspecific data do not exist. However, if speakerdependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Managem ..."
Abstract
 Add to MetaCart
(Show Context)
Speakerindependent system is desirable in many applications where speakerspecific data do not exist. However, if speakerdependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speakeradaptive speech recognition. Since adaptation is based on speakerindependent systems with only limited adaptation data, a good adaptation algorithm should be consistent with the speakerindependent parameter estimation criterion, and adapt those parameters that are less sensitive to the limited training data. Two parameter sets, the codebook mean vector and the output distribution, are regarded to be most important. They are modified in the framework of maximum likelihood estimation criterion according to the characteristics of each speaker. In order to reliably estimate those parameters, output distributions are shared with each other if they exhibit certain acoustic similarity. In addition to modify these parameters, speaker normalization with neural networks is also studied in the hope that acoustic data normalization will not only rapidly adapt the system but also enhance the robustness of speakerindependent speech recognition. Preliminary results indicate that speaker differences can be well minimized. In comparison with speakerindependent speech recognition, the error rate has been reduced from 4.3 % to 3.1 % by only using parameter adaptation techniques, with 40 adaptation sentences for each speaker. When the number of speaker adaptation sentences is comparable to that of speakerdependent training, speakeradaptive recognition works better than the best speakerdependent recognition results on the same test set, which indicates the robustness of speakeradaptive speech recognition. 1
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaassian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker c ..."
Abstract
 Add to MetaCart
(Show Context)
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaassian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMMbased speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given. Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speakerspecific training data, a 31 % reduction in word error rate was obtained compared to speakerlndependent results. Using Baysesian learning for HMM parameter smoothing and sexdependent modeling, a 21 % error reduction was observed on the FEB91 test.