Results 1 - 10
of
25
Hidden Markov processes
- IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and information-theoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discrete-time finite-state homogeneous Markov chain observed through a discrete-time memoryless invariant channel. In recent years, the work of Baum and Petrie on finite- ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
Abstract—An overview of statistical and information-theoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discrete-time finite-state homogeneous Markov chain observed through a discrete-time memoryless invariant channel. In recent years, the work of Baum and Petrie on finite-state finite-alphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximum-likelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finite-state channels, hidden Markov models, identifiability, Kalman filter, maximum-likelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Deterministically Annealed Design of Hidden Markov Model Speech Recognizers
, 2001
"... Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminant-based pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelih ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminant-based pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelihood (ML) modeling which is, in general, mismatched with the minimum error objective and hence suboptimal. Direct minimization of the error rate is difficult because of the complex nature of the cost surface, and has only been addressed recently by discriminative design methods such as generalized probabilistic descent (GPD). While existing discriminative methods offer significant benefits, they commonly rely on local optimization via gradient descent whose performance suffers from the prevalence of shallow local minima. As an alternative, we propose the deterministic annealing (DA) design method that directly minimizes the error rate while avoiding many poor local minima of the cost. DA is derived from fundamental principles of statistical physics and information theory. In DA, the HMM classifier's decision is randomized and its expected error rate is minimized subject to a constraint on the level of randomness which is measured by the Shannon entropy. The entropy constraint is gradually relaxed, leading in the limit of zero entropy to the design of regular nonrandom HMM classifiers. An efficient forward--backward algorithm is proposed for the DA method. Experiments on synthetic data and on a simplified recognizer for isolated English letters demonstrate that the DA design method can improve recognition error rates over both ML and GPD methods.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
- Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
HMM-Based Handwritten Symbol Recognition Using On-Line And Off-Line Features
- in International Conference on Acoustics Speech and Signal Processing
, 1996
"... This paper addresses the problem of recognizing on-line sampled handwritten symbols. Within the proposed symbol recognition system based on Hidden Markov Models different kinds of feature extraction algorithms are used analysing on-line features as well as off-line features and combining the classif ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper addresses the problem of recognizing on-line sampled handwritten symbols. Within the proposed symbol recognition system based on Hidden Markov Models different kinds of feature extraction algorithms are used analysing on-line features as well as off-line features and combining the classification results. By conducting writer-dependent recognition experiments, it is demonstrated that the recognition rates as well as the reliability of the results is improved by using the proposed recognition system. Furthermore, by applying handwriting data not representing symbols out of the given alphabet, an increase of their rejection rate is obtained. 1. INTRODUCTION This paper is concerned with the problem of recognizing online sampled handwritten symbols, which is one stage within the overall system presented at ICASSP`95 for understanding handwritten mathematical expressions [1][2]. Based on the soft-decision approach within the symbol segmentation and recognition stage of the ove...
Discriminative Training For Continuous Speech Recognition
- Proc. 1995 Europ. Conf. on Speech Communication and Technology
, 1995
"... Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques. 1. INTRODUCTION Recently discriminative training techniques for Hidden- Markov Models (HMM) were used successfully for automatic speech recognition. They provide better performance compared to Maximum Likelihood Estimation (MLE), since the training is concentrated on the estimation of class boundaries and not on parameters of assumed model distributions [1,12]. Although MLE and discriminative training are theoretically equivalent (if su...
Comparison Of Optimization Methods For Discriminative Training Criteria
- IN PROC. EUROSPEECH’97
, 1997
"... In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, leading to reestimation formula very similar to those derived for the EBW algorithm. Results were produced for both the TI digitstring and the SieTill corpus for continuously spoken American English and German digitstrings. The results for both techniques do not show significant differences. This experimental results support the strong link between EBW and GPD as expected from the analytic comparison.
Utterance Verification in Continuous Speech Recognition: Decoding and Training Procedures
- IEEE Trans. Speech Audio Process
, 2000
"... Abstract—This paper introduces a set of acoustic modeling and decoding techniques for utterance verication (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string cor ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract—This paper introduces a set of acoustic modeling and decoding techniques for utterance verication (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string correspond to incorrectly decoded vocabulary words or out-of-vocabulary words that may appear in an utterance. This capability is implemented here as a likelihood ratio (LR) based hypothesis testing procedure for verifying individual words in a decoded string. There are two UV techniques that are presented here. The first is a procedure for estimating the parameters of UV models during training according to an optimization criterion which is directly related to the LR measure used in UV. The second technique is a speech recognition decoding procedure where the “best ” decoded path is defined to be that which optimizes a LR criterion. These techniques were evaluated in terms of their ability to improve UV performance on a speech dialog task over the public switched telephone network. The results of an experimental study presented in the paper shows that LR based parameter estimation results in a significant improvement in UV performance for this task. The study also found that the use of the LR based decoding procedure, when used in conjunction with models trained using the LR criterion, can provide as much as an 11 % improvement in UV performance when compared to existing UV procedures. Finally, it was also found that the performance of the LR decoder was highly dependent on the use of the LR criterion in training acoustic models. Several observations are made in the paper concerning the formation of confidence measures for UV and the interaction of these techniques with statistical language models used in ASR. Index Terms—Acoustic modeling, confidence measures, discriminative training, large vocabulary continuous speech recognition, likelihood ratio, utterance verification. I.
Training algorithms for hidden conditional random fields
- In Proc. ICASSP
, 2006
"... We investigate algorithms for training hidden conditional random fields (HCRFs) – a class of direct models with hidden state sequences. We compare stochastic gradient ascent with the RProp algorithm, and investigate stochastic versions of RProp. We propose a new scheme for model flattening, and com ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We investigate algorithms for training hidden conditional random fields (HCRFs) – a class of direct models with hidden state sequences. We compare stochastic gradient ascent with the RProp algorithm, and investigate stochastic versions of RProp. We propose a new scheme for model flattening, and compare it to the state of the art. Finally we give experimental results on the TIMIT phone classification task showing how these training options interact, comparing HCRFs to HMMs trained using extended Baum-Welch as well as

