## MAP Estimation of Continuous Density HMM: Theory and Applications (1992)

Venue: | In: Proceedings of DARPA Speech and Natural Language Workshop |

Citations: | 26 - 6 self |

### BibTeX

@INPROCEEDINGS{Gauvain92mapestimation,

author = {Jean-luc Gauvain and Chin-hui Lee},

title = {MAP Estimation of Continuous Density HMM: Theory and Applications},

booktitle = {In: Proceedings of DARPA Speech and Natural Language Workshop},

year = {1992},

pages = {185--190},

publisher = {Morgan Kaufmann}

}

### OpenURL

### Abstract

We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...

### Citations

8843 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...nderlying hidden process, i.e. a multinomial model for the mixture and a Markov chain for an HMM. In these cases ML estimates are usually obtained by using the expectation-maximization (EM) algorithm =-=[3, 1, 13]-=-. This algorithm exploits the fact that the complete-data likelihood can be simpler to maximize than the likelihood of the incomplete data, as in the case where the complete-data model has sufficient ... |

4119 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...xed dimension t(x). In this case, the natural solution is to choose the prior density in a conjugate family, fk(\Deltaj'); ' 2 OEg, which includes the kernel density of f(\Deltaj`), i.e. 8x t(x) 2 OE =-=[4, 2]-=-. The MAP estimation is then reduced to the evaluation of the mode of k(`j' 0 ) = k(`j')k(`jt(x)), a problem almost identical to the ML estimation problem. However, among the families of interest, onl... |

789 | Optimal Statistical Decisions - DeGroot - 1970 |

544 |
An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities 3
- Baum
- 1972
(Show Context)
Citation Context ...r applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) =-=[1, 10, 6]-=- assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM)... |

537 |
H.F.: Mixture Densities, Maximum Likelihood and EM Algorithm
- Redner, Walker
- 1984
(Show Context)
Citation Context ...nderlying hidden process, i.e. a multinomial model for the mixture and a Markov chain for an HMM. In these cases ML estimates are usually obtained by using the expectation-maximization (EM) algorithm =-=[3, 1, 13]-=-. This algorithm exploits the fact that the complete-data likelihood can be simpler to maximize than the likelihood of the incomplete data, as in the case where the complete-data model has sufficient ... |

100 |
Maximum likelihood estimation for multivariate observations of Markov sources
- Liporace
- 1982
(Show Context)
Citation Context ...r applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) =-=[1, 10, 6]-=- assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM)... |

67 | Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains - Juang - 1985 |

65 |
A Study on Speaker Adaptation of the Parameters of continuous density
- Lee, Lin, et al.
- 1991
(Show Context)
Citation Context ...he components of a given mixture. Variance clipping can also be viewed as aMAP estimation technique with a uniform prior density constrained by a maximum (positive) value for the precision parameters =-=[9]-=-. However, this does not have the appealing interpolation capability of the conjugate priors. We experimented with this p.d.f. smoothing approach on the TI WACC SACC (Strings Correct) MLE 99.6 98.7 (8... |

57 |
A segmental K-means training procedure for connected word recognition
- Rabiner, Wilpon, et al.
- 1986
(Show Context)
Citation Context ...xamine two ways of approximatingsMAP by local maximization of f(xj)G() and f(x; sj)G(). These two solutions are the MAP versions of the Baum-Welch algorithm [1] and of the segmental k-means algorithm =-=[12]-=-, algorithms which were developed for ML estimation. Forward-Backward MAP Estimate From (14) it is straightforward to show that the auxilliary function of the EM algorithm applied to MLE of , Q(; ) = ... |

42 |
An improved MMIE training algorithm for speakerindependent, small vocabulary, continuous speech recognition
- Normandin, Morgera
- 1991
(Show Context)
Citation Context ...teration until convergence. If we use the forward-backward MAP algorithm we obtain a corrective training algorithm for CDHMM's very similar to the recently proposed corrective MMIE training algorithm =-=[11]-=-. Corrective training was evaluated on both the TI/NIST SI connected digit and the RM tasks. Only the Gaussian mean vectors and the mixture weights were corrected. For the TI digits a set of 21 phonet... |

40 |
The empirical Bayes approach to statistical decision problems
- Robbins
- 1964
(Show Context)
Citation Context ... of this family of p.d.f.'s fG(\Deltaj'); ' 2 OEg is also assumed known basedon common or subjective knowledgeabout the stochastic process. Another solution is to adopt an empirical Bayesian approach =-=[14]-=- where the prior parameters are estimated directly from data. The estimation is then basedon the marginal distribution of the data given the prior parameters. Adopting the empirical Bayes approach, it... |

30 | Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
- Gauvain
- 1991
(Show Context)
Citation Context ...phonetic context. The HMM parameters for each class given the mixture component were then computed, and moment estimates were obtained for the tied prior parameters also subject to conditions (32-33) =-=[5]-=-. EXPERIMENTAL SETUP The experiments presented in this paper used various sets of context-independent (CI) and context-dependent (CD) phone models. Each model is a left-to-right HMM with Gaussian mixt... |

24 |
On distributions admitting a sufficient statistic
- Koopman
- 1936
(Show Context)
Citation Context ...k(`j' 0 ) = k(`j')k(`jt(x)), a problem almost identical to the ML estimation problem. However, among the families of interest, only exponential families have a sufficient statistic of fixed dimension =-=[7]-=-. When there is no sufficient statistic of fixed dimension, MAP estimation, like ML estimation, is a much more difficult problem because the posterior density is not expressible in terms of a fixed nu... |

10 | Improved Acoustic Modeling for Continuous Speech Recognition
- Lee, Giachin, et al.
- 1990
(Show Context)
Citation Context ...ach model is a left-to-right HMM with Gaussian mixture state observation densities. Diagonal covariance matrices are used and the transition probabilities are assumed fixed and known. As described in =-=[8]-=-, a 38-dimensional feature vector composed of LPC-derived cepstrum coefficients, and first and second order time derivatives. Results are reported for the RM task with the standard word pair grammar a... |