Results 1 
8 of
8
SharedDistribution Hidden Markov Models for Speech Recognition
, 1991
"... Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generaliz ..."
Abstract

Cited by 275 (7 self)
 Add to MetaCart
Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shareddistribution model is proposed to replace generalized triphone models for speakerindependent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will inc...
Predicting Unseen Triphones With Senones
, 1993
"... In largevocabulary speech recognition, the decoder often encounters triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decisiontree based senones to generate needed senon ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
In largevocabulary speech recognition, the decoder often encounters triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decisiontree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. To find the senone a Markov state of any triphone is associated with, we traverse the corresponding tree until we reach a leaf node, where a senone is represented. We used the DARPA 5,000word speakerindependent Wall Street Journal dictation task to evaluate the proposed method. The word error rate was reduced by 11% when unseen triphones were modeled by the decisiontree based senones. When there were at least 5 unseen triphones in each test utterance, the error rate could be reduced by more than 20%. This research was spons...
Bayesian Learning for Hidden Markov Model with Gaussian Mixture State Observation Densities
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker ..."
Abstract

Cited by 37 (15 self)
 Add to MetaCart
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMMbased speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering, and corrective training are given.
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
 Proc. DARPA Speech and Natural Language Workshop
, 1991
"... An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker cl ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMMbased speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given. Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speakerspecific training data, a 31% reduction in word error r...
MAP Estimation of Continuous Density HMM: Theory and Applications
 In: Proceedings of DARPA Speech and Natural Language Workshop
, 1992
"... We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...
IMPLEMENTATION ASPECTS OF LARGE VOCABULARY RECOGNITION BASED ON INTRAWORD AND INTERWORD PHONETIC UNITS
, 1990
"... Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the trainin ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the recognition structure be efficient in terms of computation and memory, and accurate in terms of actually determining the best path through the lattice, so that a wide range of training (subword unit creation) strategies can be efficiently evaluated in a reasonable time period. We have considered an architecture in which we incorporate several well known procedures (beam search, compiled network, etc.) with some new ideas (stacks of active network nodes, likelihood computation on demand, guided search, etc.) to implement a search procedure which maintains the accuracy of the full search but which can decode a single sentence in about one minute of computing time (about 20 times real time) on a vectorized, concurrent processor. The ways in which we have realized this significant computational reduction are described in this paper.
Factorization Of Language Constraints In Speech Recognition
, 1991
"... Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizer ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Integration of language constraints into a large vocabulary speech recognition system often leads to prohibitive complexity. We propose to factor the constraints into two components. The first is characterized by a covering grammar which is small and easily integrated into existing speech recognizers. The recognized string is then decoded by means of an efficient language postprocessor in which the full set of constraints is imposed to correct possible errors introduced by the speech recognizer.
MAP Estimation of Continuous Density HMM: Theory and Applications
"... We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observat ..."
Abstract
 Add to MetaCart
We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forwardbackward algorithm and the segmental kmeans algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective ~aining. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach.