Results 1  10
of
83
Large margin hidden Markov models for automatic speech recognition
 in Advances in Neural Information Processing Systems 19
, 2007
"... We study the problem of parameter estimation in continuous density hidden Markov models (CDHMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on maxmargin Markov networks, our ap ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We study the problem of parameter estimation in continuous density hidden Markov models (CDHMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on maxmargin Markov networks, our approach is specifically geared to the modeling of realvalued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CDHMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradientbased methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus. 1
Deep Neural Networks for Acoustic Modeling in Speech Recognition
"... Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative ..."
Abstract

Cited by 35 (18 self)
 Add to MetaCart
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feedforward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks with many hidden layers, that are trained using new methods have been shown to outperform Gaussian mixture models on a variety of speech recognition benchmarks, sometimes by a large margin. This paper provides an overview of this progress and represents the shared views of four research groups who have had recent successes in using deep neural networks for acoustic modeling in speech recognition. I.
Investigations on error minimizing training criteria for discriminative training in automatic speech recognition
 Proceedings of Eurospeech
, 2005
"... Discriminative training criteria have been shown to consistently outperform maximum likelihood trained speech recognition systems. In this paper we employ the Minimum Classification Error (MCE) criterion to optimize the parameters of the acoustic model of a large scale speech recognition system. The ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Discriminative training criteria have been shown to consistently outperform maximum likelihood trained speech recognition systems. In this paper we employ the Minimum Classification Error (MCE) criterion to optimize the parameters of the acoustic model of a large scale speech recognition system. The statistics for both the correct and the competing model are solely collected on word lattices without the use of Nbest lists. Thus, particularly for long utterances, the number of sentence alternatives taken into account is significantly larger compared to Nbest lists. The MCE criterion is embedded in an extended unifying approach for a class of discriminative training criteria which allows for direct comparison of the performance gain obtained with the improvements of other commonly used criteria such as Maximum Mutual Information (MMI) and Minimum Word Error (MWE). Experiments conducted on large vocabulary tasks show a consistent performance gain for MCE over MMI. Moreover, the improvements obtained with MCE turn out to be in the same order of magnitude as the performance gains obtained with the MWE criterion. 1.
Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
 In Proceedings of ICASSP 2007
, 2007
"... In this paper we compare three frameworks for discriminative training of continuousdensity hidden Markov models (CDHMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximi ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper we compare three frameworks for discriminative training of continuousdensity hidden Markov models (CDHMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximization. Unlike CML and MCE, our formulation of large margin training explicitly penalizes incorrect decodings by an amount proportional to the number of mislabeled hidden states. It also leads to a convex optimization over the parameter space of CDHMMs, thus avoiding the problem of spurious local minima. We used discriminatively trained CDHMMs from all three frameworks to build phonetic recognizers on the TIMIT speech corpus. The different recognizers employed exactly the same acoustic front end and hidden state space, thus enabling us to isolate the effect of different cost functions, parameterizations, and numerical optimizations. Experimentally, we find that our framework for large margin training yields significantly lower error rates than both CML and MCE training. Index Terms — speech recognition, discriminative training, MMI, MCE, large margin, phoneme recognition 1.
Improved discriminative training using phone lattices
, 2005
"... We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied. We also propose a new discriminative training criterion, namely, minimum phone frame error (MPFE). When comb ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied. We also propose a new discriminative training criterion, namely, minimum phone frame error (MPFE). When combined with the maximum mutual information (MMI) criterion using Ismoothing, replacing the standard minimum phone error (MPE) criterion with MPFE led to a small but consistent win in several applications. Phonelatticebased discriminative training gave around 8 % to 12 % relative word error rate (WER) reduction in SRI’s latest English Conversational Telephone Speech and Broadcast News transcription systems developed for DARPA’s EARS project. 1.
Large Margin Training for Hidden Markov Models with Partially Observed States TrinhMinhTri Do
"... Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open problem. We propose a new learning algorithm that relies on nonconvex optimization and bundle methods and allows tackling the original optimization problem as is. It is proved to converge to a solution with accuracy ɛ with a rate O(1/ɛ). We provide experimental results gained on speech and handwriting recognition that demonstrate the potential of the method. 1.
Improving broadcast news transcription by lightly supervised discriminative training
 IN PROC. IEEE INT. CONF. ACOUST., SPEECH, SIGNAL PROCESS
, 2004
"... In this paper, we present our experiments on lightly supervised discriminative training with large amounts of broadcast news data for which only closed caption transcriptions are available (TDT data). In particular, we use language models biased to the closedcaption transcripts to recognise the audi ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
In this paper, we present our experiments on lightly supervised discriminative training with large amounts of broadcast news data for which only closed caption transcriptions are available (TDT data). In particular, we use language models biased to the closedcaption transcripts to recognise the audio data, and the recognised transcripts are then used as the training transcriptions for acoustic model training. A range of experiments that use maximum likelihood (ML) training as well as discriminative training based on either maximum mutual information (MMI) or minimum phone error (MPE) are presented. In a 5xRT broadcast news transcription system that includes adaptation, it is shown that reductions in word error rate (WER) in the range of 1 % absolute can be achieved. Finally, some experiments on training data selection are presented to compare different methods of “filtering” the transcripts.
Expectation Maximization Algorithms for Conditional Likelihoods
 Proceedings of the 22nd International Conference on Machine Learning (ICML2005
, 2005
"... We introduce an expectation maximizationtype (EM) algorithm for maximum likelihood optimization of conditional densities. It is applicable to hidden variable models where the distributions are from the exponential family. The algorithm can alternatively be viewed as automatic step size selection for ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
We introduce an expectation maximizationtype (EM) algorithm for maximum likelihood optimization of conditional densities. It is applicable to hidden variable models where the distributions are from the exponential family. The algorithm can alternatively be viewed as automatic step size selection for gradient ascent, where the amount of computation is traded off to guarantees that each step increases the likelihood. The tradeoff makes the algorithm computationally more feasible than the earlier conditional EM. The method gives a theoretical basis for extended Baum Welch algorithms used in discriminative hidden Markov models in speech recognition, and compares favourably with the current best method in the experiments.
Large margin hidden markov models for speech recognition
, 2005
"... In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muticlass separation margin. The approach is named as large margi ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muticlass separation margin. The approach is named as large margin HMM. Firstly, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Secondly, by imposing different constraints to the minimax problem, we propose three solutions to the large margin HMM estimation problem, namely the iterative localized optimization method, the constrained joint optimization method and the semidefinite programming (SDP) method. These new training methods are evaluated in the isolated Eset recognition task using ISOLET database and the TIDIGITS connected digit string recognition task. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods.
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i