Results 1  10
of
143
A tutorial on hidden markov models and selected applications in speech recognition
 Proceedings of the IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract

Cited by 4251 (1 self)
 Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech. I.
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 170 (3 self)
 Add to MetaCart
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Recent advances in the automatic recognition of audiovisual speech
 PROC. IEEE
, 2003
"... Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech r ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design, based on a cascade of linear image transforms of an appropriate video regionofinterest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small to largevocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves automatic speech recognition over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 61 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
MMIE training of large vocabulary recognition systems
, 1997
"... This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable seg ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between the number of parameters and the available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using up to 66 hours of acoustic training data. These demonstrate that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, the experimental results show that MMIE optimisation of system structure and param...
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree searchbased restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Why does unsupervised pretraining help deep learning?
, 2010
"... Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks ..."
Abstract

Cited by 45 (11 self)
 Add to MetaCart
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pretraining phase. The main question investigated here is the following: why does unsupervised pretraining work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pretraining appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pretraining effect.
A tutorial on energybased learning
 Predicting Structured Data
, 2006
"... EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graphtransformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of nonprobabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Hidden Markov Models for Labeled Sequences
 In Proceedings of the 12th IAPR ICPR'94
, 1994
"... A hidden Markov model for labeled observations, called a CHMM, is introduced and a maximum likelihood method is developed for estimating the parameters of the model. Instead of training it to model the statistics of the training sequences it is trained to optimize recognition. It resembles MMI train ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
A hidden Markov model for labeled observations, called a CHMM, is introduced and a maximum likelihood method is developed for estimating the parameters of the model. Instead of training it to model the statistics of the training sequences it is trained to optimize recognition. It resembles MMI training, but is more general, and has MMI as a special case. The standard forwardbackward procedure for estimating the model cannot be generalized directly, but an "incremental EM" method is proposed. 1 Introduction Hidden Markov Models (HMMs) are often used to model the statistical structure of a set of observations like speech signals [12]. A model is estimated so as to maximize the likelihood of the observations or, in a Bayesian setting, the a posteriori probability of the model. Often a set of different models are estimated independently, for instance one model for each word in a small vocabulary speech application. After estimation they are used for discrimination, although they were not...
Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1997
"... Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlatio ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (i) to maximize the likelihood of observed speech signals, or (ii) to minimize the number of classification errors. We derive an ExpectationMaximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one smallsized vocabulary (connected alphadigits) and one mediumsized vocabulary (New Jersey town names). We find that modeling feature correlations...