## A tutorial on hidden Markov models and selected applications in speech recognition (1989)

### Cached

### Download Links

Venue: | PROCEEDINGS OF THE IEEE |

Citations: | 4598 - 1 self |

### BibTeX

@INPROCEEDINGS{Rabiner89atutorial,

author = {Lawrence R. Rabiner},

title = {A tutorial on hidden Markov models and selected applications in speech recognition},

booktitle = {PROCEEDINGS OF THE IEEE},

year = {1989},

pages = {257--286},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Sec-ond the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to care-fully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.

### Citations

471 | prediction: A tutorial review - Makhoul - 1975 |

175 |
Maximum mutual information estimation of hidden Markov model parameters for speech recognition
- Bahl, Brown, et al.
- 1986
(Show Context)
Citation Context ...o alleviate this type of problem, there has been proposed at least :WO alternatives to the standard maximum likelihood (ML) optimization procedure for estimating HMM parameters. The first alternative =-=[32]-=- is based on the idea that several HMMs are to be designed and we wish to design them all at the same time in such a way so as to maximize the discrimination power of each model (i.e., each model’s ab... |

155 |
Continuously variable duration hidden markov models for automatic speech recognition
- Levinson
- 1986
(Show Context)
Citation Context ...or variable duration HMMs than for the standard HMM. One proposal to alleviate some of these problems is to use a parametric state duration density instead of the nonparametric p,(d) used above [29], =-=[30]-=-. In particular, proposals include the Gaussian family with pm = X(d, PI, a:) with parameters p, and of, or the Gamma family with ,;dv! - le-tt!d pJd) = rw (82) (83) with parameters V, and 7, and with... |

135 | Speech analysis and synthesis by linear prediction of the speech wave - Atal, Hanauer - 1971 |

121 |
A Probabilistic Distance Measure for Hidden Markov Models
- Juang, Rabiner
- 1985
(Show Context)
Citation Context ...ppear in IEEE TRANSACTIONS ON INFORMATION THEORY. effectively being used. None of the approaches, however, assumes that the source has the probability distribution of the model. F. Comparison of HMMs =-=[34]-=- An interesting question associated with HMMs is the following: Given two HMMs, X1 and X2, what is a reasonable measure of the similarity of the two models? A key point here is the similarity criterio... |

101 |
Maximum likelihood estimation for multivariate observations of markov sources
- Liporace
- 1982
(Show Context)
Citation Context ...sity function (pdf) to insure that the parameters of the pdf can be reestimated in a consistent way. The most general representation of the pdf, for which a reestimation procedure has been formulated =-=[24]-=--[26], is a finite mixture of the form M b,(o) = C c/mxtO, p/m, U,], m=l 1 5 j 5 N (49) whereoisthevector being modeled,c,,,,isthemixturecoefficient for the mth mixture in state/ and 31. is any log-co... |

95 |
Large-vocabulary speaker-independent continuous speech recognition: The SPHINX system
- Lee
- 1988
(Show Context)
Citation Context ... number of problems in speech recognition including the estimation of trigram word probabilities for language models [13], and the estimation of HMM output probabilities for trigram phone models [37, =-=[38]-=-. Another way of handling the effects of insufficient training data is to add extra constraints to the model parameters to insure that no model parameter estimate falls below a specified level. Thus, ... |

79 |
Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition
- Russell, Moore
- 1985
(Show Context)
Citation Context ...cult for variable duration HMMs than for the standard HMM. One proposal to alleviate some of these problems is to use a parametric state duration density instead of the nonparametric p,(d) used above =-=[29]-=-, [30]. In particular, proposals include the Gaussian family with pm = X(d, PI, a:) with parameters p, and of, or the Gamma family with ,;dv! - le-tt!d pJd) = rw (82) (83) with parameters V, and 7, an... |

78 | predictive hidden Markov models and the speech signal - Poritz, “Linear - 1982 |

69 |
Maximum likelihood estimation for multivariate mixture observations of markov chains
- Juang, Levinson, et al.
- 1986
(Show Context)
Citation Context ...ion procedure. This is the case because any HMM parameter set to zero initially, will remain at zero throughout the reestimation procedure (see (44)). A. Continuous Observation Densities in HMMs 1241-=-=[26]-=- All of our discussion, to this point, has considered only the case when the observations were characterized as discrete symbols chosen from a finite alphabet, and therefore we could use a discrete pr... |

67 | Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains - Juang - 1985 |

67 |
The harpy speech understanding system
- Lowerre, Reddy
- 1980
(Show Context)
Citation Context ...iterion is to use a separate training sequence of observations 0’ to derive model parameters for each model A,. Thus the standard ML optimization yields (84) The proposed alternative design criterion =-=[31]-=- is the maximum mutual information (MMI) criterion in which the average mutual information l between the observation sequence 0’ and the complete set of models X = (A1, A, ... , h,) is maximized. One ... |

64 |
Rabiner Mixture Autoregressive Hidden Markov Models for Speech Signals
- Juang, Lawrence
- 1985
(Show Context)
Citation Context ...ortion of the observation vector accounted for by the kth mixture component. A similar interpretation can be given for the reestimation term for the covariance matrix u/k. B. Autoregressive HMMS [27J =-=[28]-=- Although the general formulation of continuous density HMMs is applicable to a wide range of problems, there is one other very interesting class of HMMs that is particularly applicable to speech proc... |

57 |
A Segmental K-Means Training Procedure for Connected Word
- Rabiner, Wilpon, et al.
- 1986
(Show Context)
Citation Context ...us density model are comparable. Finally Table 1 shows that the autoregressive density HMM gives poorer performance than the standard mixture density model. VII. CONNECTED WORD RECOGNITION USING HMMs =-=[59]-=-- 1631 A somewhat more complicated problem of speech recognition, to which HMMs have been successfully applied, is the problem of connected word recognition. The basic premise of connected word recogn... |

48 | J.G.Wilpon, On the use of bandpass liftering in speech recognition
- Juang, Rabiner
- 1987
(Show Context)
Citation Context ...ere Q > p and Q = 12 in the results to be described later in this section. 6) Cepstral Weighting: The Q-coefficient cepstral vector c,(m) at time frame Pis weighted by a window WJm) of the form [55], =-=[56]-=- to give W,(m) = 1 + - sin . (y), 1 5 m 5 Q (115) 2 - - N M w(n) P g(n) BLOCK Xt(n) XI(”) AUTO- Re(m) LPC/ aL(m) 1-az-’ INTO CEPSTRAL FRAMES FRAME ANALYSIS ck(m) e,(m) = c,(m) - W,(m). (116) 7) Delta ... |

42 |
Recognition of isolated digits using hidden Markov models with continuous mixture densities
- Rabiner, Juang, et al.
- 1985
(Show Context)
Citation Context ...the mixture gains clm as well as the diagonal covariance coefficients Ulm(r, r) to be greater than or equal to some minimum values (we use in all cases). F. Segmental k-Means Segmentation into States =-=[42]-=- We stated earlier that good initial estimates of the parameters of the bi(O,) densities were essential for rapid and proper convergence of the reestimation formulas. Hence a procedure for providing g... |

37 | A Minimum Discrimination Information Approach for Hidden Markov Modeling
- Ephraim, Dembo, et al.
- 1989
(Show Context)
Citation Context ...ond alternative philosophy is to assume that the signal to be modeled was not necessarily generated by a Markovsource, but does obey certain constraints (e.g., positive definite correlation function) =-=[33]-=-. The goal of the design procedure is therefore to choose HMM parameters which minimize the discrimination information (DI) or the cross entropy between the set of valid (i.e., which satisfy the measu... |

22 | A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building - Rabiner, Levinson - 1985 |

21 |
Some properties of continuous hidden Markov model representations
- Rabiner, Juang, et al.
- 1985
(Show Context)
Citation Context ...the B parameters, experience has shown that good initial estimates are helpful in the discrete symbol case, and are essential (when dealing with multiple mixtures) in the continuous distribution case =-=[35]-=-. Such initial estimates can be obtained in a number of ways, including manual segmentation of the observation sequence($ into states with averaging of observations within states, maximum likelihood s... |

18 |
Integration of acoustic information in a large vocabulary word recognizer
- Gupta, Lennig, et al.
- 1987
(Show Context)
Citation Context ...ive to be as thorough or as complete in our descriptions as to what was done as we were in describing the theory of HMMs. The interested reader should read the material in [6], [IO], [12], [13], [39]-=-=[46]-=- for more complete descriptions of individual systems. Our main goal here is to show how specific aspects of HMM theoryget applied, not to make the reader an expert in speech recognition technology. A... |

17 | A speaker-stress resistant hmm isolated word recognition - Paul - 1987 |

17 | An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints - Wilpon, Rabiner, et al. - 1984 |

17 | BYBLOS: the BBN continuous speech recognition system - Chow, Dunham, et al. - 1967 |

14 | Speaker-Dependent Connected Speech Recognition Via Dy- namic Programming and Statistical Methods - Bourlard, Kamp, et al. |

7 | Application of hidden Markov models to automatic speech endpoint detection - Wilpon, Rabiner - 1987 |

7 |
A weighted cepstral distance measure for speech recognition
- Tokhura
- 1987
(Show Context)
Citation Context ...nt, where Q > p and Q = 12 in the results to be described later in this section. 6) Cepstral Weighting: The Q-coefficient cepstral vector c,(m) at time frame Pis weighted by a window WJm) of the form =-=[55]-=-, [56] to give W,(m) = 1 + - sin . (y), 1 5 m 5 Q (115) 2 - - N M w(n) P g(n) BLOCK Xt(n) XI(”) AUTO- Re(m) LPC/ aL(m) 1-az-’ INTO CEPSTRAL FRAMES FRAME ANALYSIS ck(m) e,(m) = c,(m) - W,(m). (116) 7) ... |

3 | Multistyle Training for Robust Isolated Word Speech Recognition - Lippmann, Martin - 1987 |

3 | et al., “Experiments with the TANGORA 20,000 word speech recognizer - Averbuch - 1987 |

3 | A model-based connected digit recognition system using either hidden Markov models or templates, Computer Speech and Language - Rabiner, Wilpon, et al. - 1986 |

3 | Context-dependent phonetic Markov models for large vocabulary speech recognition - Derouault - 1987 |

2 | Statistical inference for probabilisticfunctionsof finite state markovchains - Petrie - 1966 |

2 | Vector quantization and Markov source models applied to speech recognition - Billi - 1982 |

2 | Speech recognition with very large size dictionary - Merialdo - 1987 |

1 |
Sondhi,“On theapplication of vector quantization and hidden Markov models to speaker-independent isolated word recognition
- Rabiner, Levinson, et al.
- 1983
(Show Context)
Citation Context ...ing parameters are rescaled so that the densities obey the required stochastic constraints. Such post-processor techniques have been applied to several problems in speech processing with good success =-=[39]-=-. It can be seen from (112) that this procedure is essentially equivalent to a simple form of deleted interpolation in which the model X’ is a uniform distribution model, and the interpolation value E... |

1 | the use of hidden Markov models for speaker-independent recognition of isolated words from a medium-size vocabulary - “On - 1984 |

1 | Isolated word recognition - Poritz, Richter - 1986 |

1 | Analysis-synthesis telephony based upon the maximum likelihood method - Saito - 1968 |

1 |
Rosenberg,“On the useof instantaneous and transitional spectral information in speaker recognition
- E
- 1986
(Show Context)
Citation Context ...derivative of the sequence of weighted cepstral vectors is approximated by a first-order orthogonal polynomial over a finite length window of (2K + 1) frames, centered around the current vector [571, =-=[58]-=-. (K = 2 in the results to be presented; hence a 5 frame window is used for the computation of the derivative.) The cepstral derivative (i.e., the delta cepstrum vector) is computed as At,(m) = [ ki-,... |

1 | Global connected digit recognition using Baum-Welch algorithm - Wellekens - 1986 |