Results 1  10
of
84
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
Support vector machines for segmental minimum bayes risk decoding of continuous speech
 In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU
, 2003
"... Segmental Minimum Bayes Risk (SMBR) Decoding involves the refinement of the search space into sequences of small sets of confusable words. We describe the application of Support Vector Machines (SVMs) as discriminative models for the refined search spaces. We show that SVMs, which in their basic for ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
(Show Context)
Segmental Minimum Bayes Risk (SMBR) Decoding involves the refinement of the search space into sequences of small sets of confusable words. We describe the application of Support Vector Machines (SVMs) as discriminative models for the refined search spaces. We show that SVMs, which in their basic formulation are binary classifiers of fixed dimensional observations, can be used for continuous speech recognition. We also study the use of GiniSVMs, which is a variant of the basic SVM. On a small vocabulary task, we show this two pass scheme outperforms MMI trained HMMs. Using system combination we also obtain further improvements over discriminatively trained HMMs. 1.
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by stateoftheart stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by stateoftheart stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Discriminative speaker adaptation with conditional maximum likelihood linear regression
 In Eurospeech
, 2001
"... We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRty ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRtype speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary. 1.
Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
, 2002
"... Linear transforms have been used extensively for training and adaptation of HMMbased ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
Linear transforms have been used extensively for training and adaptation of HMMbased ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that employ linear transforms for feature normalization and for speaker adaptive training. We integrate these discriminative linear transforms into MMI estimation of HMM parameters for improvement of large vocabulary conversational speech recognition systems. 1.
Improvements In Linear Transform Based Speaker Adaptation
, 2001
"... This paper presents three forms of linear transform based speaker adaptation that can give better performance than standard maximum likelihood linear regression (MLLR) adaptation. For unsupervised adaptation, a latticebased technique is introduced which is compared to MLLR using confidence scores. ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
This paper presents three forms of linear transform based speaker adaptation that can give better performance than standard maximum likelihood linear regression (MLLR) adaptation. For unsupervised adaptation, a latticebased technique is introduced which is compared to MLLR using confidence scores. For supervised adaptation, estimation of the adaptation matrices using the maximum mutual information criterion is discussed which leads to the MMILR approach. Recognition experiments show that lattice MLLR can reduce word error rates on a Switchboard task by 1.4% absolute. For recognition of nonnative speech from the Wall Street Journal database, a reduction in word error rate of 1016% relative was obtained using MMILR compared to standard MLLR.
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large . . .
 IN PROC. EUROSPEECH
, 2005
"... Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation proced ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation procedures are developed and evaluated for small vocabulary and large vocabulary recognition tasks, and additive performance improvements are shown relative to maximum mutual information estimation. These relative gains are explained through a detailed analysis of individual word recognition errors.
Structurally discriminative graphical models for automatis speech recognition results from the 2001 johns hopkins summer workshop
 Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 2002
"... In recent years there has been growing interest in discriminative parameter training techniques, resulting from notable improvements in speech recognition performance on tasks ranging in size from digit recognition to Switchboard. Typified by Maximum Mutual Information training, these methods assume ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
(Show Context)
In recent years there has been growing interest in discriminative parameter training techniques, resulting from notable improvements in speech recognition performance on tasks ranging in size from digit recognition to Switchboard. Typified by Maximum Mutual Information training, these methods assume a fixed statistical modeling structure, and then optimize only the associated numerical parameters (such as means, variances, and transition matrices). In this paper, we explore the significantly different methodology of discriminative structure learning. Here, the fundamental dependency relationships between random variables in a probabilistic model are learned in a discriminative fashion, and are learned separately from the numerical parameters. In order to apply the principles of structural discriminability, we adopt the framework of graphical models, which allows an arbitrary set of variables with arbitrary conditional independence relationships to be modeled at each time frame. We present results using a new graphical modeling toolkit (described in a companion paper) from the recent 2001 Johns Hopkins Summer Workshop. These results indicate that significant gains result from discriminative structural analysis of both conventional MFCC and novel AMFM features on the Aurora continuous digits task.