Results 1 - 10
of
61
Large margin hidden Markov models for automatic speech recognition
- in Advances in Neural Information Processing Systems 19
, 2007
"... We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our ap ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus. 1
Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
- In Proceedings of ICASSP 2007
, 2007
"... In this paper we compare three frameworks for discriminative training of continuous-density hidden Markov models (CD-HMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximi ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
In this paper we compare three frameworks for discriminative training of continuous-density hidden Markov models (CD-HMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximization. Unlike CML and MCE, our formulation of large margin training explicitly penalizes incorrect decodings by an amount proportional to the number of mislabeled hidden states. It also leads to a convex optimization over the parameter space of CD-HMMs, thus avoiding the problem of spurious local minima. We used discriminatively trained CD-HMMs from all three frameworks to build phonetic recognizers on the TIMIT speech corpus. The different recognizers employed exactly the same acoustic front end and hidden state space, thus enabling us to isolate the effect of different cost functions, parameterizations, and numerical optimizations. Experimentally, we find that our framework for large margin training yields significantly lower error rates than both CML and MCE training. Index Terms — speech recognition, discriminative training, MMI, MCE, large margin, phoneme recognition 1.
Improved discriminative training using phone lattices
, 2005
"... We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied. We also propose a new discriminative training criterion, namely, minimum phone frame error (MPFE). When comb ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied. We also propose a new discriminative training criterion, namely, minimum phone frame error (MPFE). When combined with the maximum mutual information (MMI) criterion using I-smoothing, replacing the standard minimum phone error (MPE) criterion with MPFE led to a small but consistent win in several applications. Phone-lattice-based discriminative training gave around 8 % to 12 % relative word error rate (WER) reduction in SRI’s latest English Conversational Telephone Speech and Broadcast News transcription systems developed for DARPA’s EARS project. 1.
Expectation Maximization Algorithms for Conditional Likelihoods
- Proceedings of the 22nd International Conference on Machine Learning (ICML-2005
, 2005
"... We introduce an expectation maximizationtype (EM) algorithm for maximum likelihood optimization of conditional densities. It is applicable to hidden variable models where the distributions are from the exponential family. The algorithm can alternatively be viewed as automatic step size selection for ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We introduce an expectation maximizationtype (EM) algorithm for maximum likelihood optimization of conditional densities. It is applicable to hidden variable models where the distributions are from the exponential family. The algorithm can alternatively be viewed as automatic step size selection for gradient ascent, where the amount of computation is traded off to guarantees that each step increases the likelihood. The tradeoff makes the algorithm computationally more feasible than the earlier conditional EM. The method gives a theoretical basis for extended Baum Welch algorithms used in discriminative hidden Markov models in speech recognition, and compares favourably with the current best method in the experiments.
Large Margin Training for Hidden Markov Models with Partially Observed States Trinh-Minh-Tri Do
"... Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the non-convexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the non-convexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open problem. We propose a new learning algorithm that relies on non-convex optimization and bundle methods and allows tackling the original optimization problem as is. It is proved to converge to a solution with accuracy ɛ with a rate O(1/ɛ). We provide experimental results gained on speech and handwriting recognition that demonstrate the potential of the method. 1.
Improving broadcast news transcription by lightly supervised discriminative training
- IN PROC. IEEE INT. CONF. ACOUST., SPEECH, SIGNAL PROCESS
, 2004
"... In this paper, we present our experiments on lightly supervised discriminative training with large amounts of broadcast news data for which only closed caption transcriptions are available (TDT data). In particular, we use language models biased to the closedcaption transcripts to recognise the audi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we present our experiments on lightly supervised discriminative training with large amounts of broadcast news data for which only closed caption transcriptions are available (TDT data). In particular, we use language models biased to the closedcaption transcripts to recognise the audio data, and the recognised transcripts are then used as the training transcriptions for acoustic model training. A range of experiments that use maximum likelihood (ML) training as well as discriminative training based on either maximum mutual information (MMI) or minimum phone error (MPE) are presented. In a 5xRT broadcast news transcription system that includes adaptation, it is shown that reductions in word error rate (WER) in the range of 1 % absolute can be achieved. Finally, some experiments on training data selection are presented to compare different methods of “filtering” the transcripts.
Precision matrix modelling for large vocabulary continuous speech recognition
, 2004
"... Recently, structured precision matrix models were found to outperform the conventional diagonal covariance matrix models. Minimum phone error discriminative training of these models gave very good unadapted performance on large vocabulary continuous speech recognition systems. To obtain state-of-the ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Recently, structured precision matrix models were found to outperform the conventional diagonal covariance matrix models. Minimum phone error discriminative training of these models gave very good unadapted performance on large vocabulary continuous speech recognition systems. To obtain state-of-the-art performance, it is important to apply adaptation techniques efficiently to these models. In this paper, simple row-by-row iterative formulae are described for both MLLR mean and constrained MLLR transform estimations of these models. These update formulae are derived within the standard expectation maximisation framework and are guaranteed to increase the likelihood of the adaptation data. Efficient approximate schemes for these adaptation methods are also investigated to further reduce the computation. Experimental results are presented based on the MPE trained Subspace for Precision and Mean models, evaluated on both broadcast news and conversational telephone speech English tasks. 1.
Large margin hidden markov models for speech recognition
, 2005
"... In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margi ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margin HMM. Firstly, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Secondly, by imposing different constraints to the minimax problem, we propose three solutions to the large margin HMM estimation problem, namely the iterative localized optimization method, the constrained joint optimization method and the semidefinite pro-gramming (SDP) method. These new training methods are evaluated in the isolated E-set recognition task using ISOLET database and the TIDIGITS connected digit string recog-nition task. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods.
Recent Advances in Broadcast News Transcription
- in Proc. IEEE ASRU Workshop
, 2003
"... This paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previousl ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. A number of new features have also been added. These include gender-dependent (GD) discriminative training; and modified discriminative training using lattice re-generation and combination. On the 2003 evaluation set the system gave an overall word error rate of 10.7% in less than 10 times real time (10RT).
Linear Gaussian models for speech recognition
- CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo-

