Results 1  10
of
13
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract

Cited by 60 (8 self)
 Add to MetaCart
(Show Context)
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree searchbased restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Comparison Of Optimization Methods For Discriminative Training Criteria
 IN PROC. EUROSPEECH’97
, 1997
"... In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended BaumWelch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended BaumWelch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, leading to reestimation formula very similar to those derived for the EBW algorithm. Results were produced for both the TI digitstring and the SieTill corpus for continuously spoken American English and German digitstrings. The results for both techniques do not show significant differences. This experimental results support the strong link between EBW and GPD as expected from the analytic comparison.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various nonspeech variabilities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multistyle training, for example speakerindependent training. Effectively, the nonspeech variabilities are ignored. Though good performance has been obtained with multistyle systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multistyle training, a set of transforms is used to represent the nonspeech variabilities. A canonical
Structured Precision Matrix Modelling for Speech Recognition
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices is approximately 53,000 words. ii Summary The most extensively and successfully applied acoustic model for speech recognition is the Hidden Markov Model (HMM). In particular, a multivariate Gaussian Mixture Model (GMM) is typically used to represent the output density function of each HMM state. For reasons of efficiency, the covariance matrix associated with each Gaussian component is assumed diagonal and the probability of successive observations is assumed independent given the HMM state sequence. Consequently, the spectral (intraframe) and temporal (interframe) correlations are poorly modelled. This thesis investigates ways of improving these aspects by extending the standard HMM. Parameters for these extended models are estimated discriminatively using the
Discriminative Adaptive Training and Bayesian Inference for Speech Recognition
, 2009
"... ..."
(Show Context)
A Survey of Discriminative and Connectionist Methods for Speech Processing
, 2002
"... Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Nondiscriminative techniques compute the likelihood of the observed data assuming an event. Nondiscriminative ..."
Abstract
 Add to MetaCart
(Show Context)
Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Nondiscriminative techniques compute the likelihood of the observed data assuming an event. Nondiscriminative methods such as simple HMMs (hidden Markov models) achieved success despite their lack of discriminative modelling. This survey will look at enhancements to the HMM model which have improved their discrimination ability and hence their overall performance. This survey also reviews alternative discriminative methods, namely connectionist methods such as ANNs (arti cial neural networks). We will also draw comparisons between discriminative HMMs and connectionist models, showing that connectionist models can be viewed as a generalisation of discriminative HMMs. 1
LINEAR TRANSFORMS IN AUTOMATIC SPEECH RECOGNITION: ESTIMATION PROCEDURES AND INTEGRATION OF DIVERSE ACOUSTIC DATA
"... Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation ..."
Abstract
 Add to MetaCart
(Show Context)
Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation of the acoustic models to the speaker, the channel, and the task. Our focus in the first part of this talk is the development of training methods based on the Maximum Mutual Information (MMI) and the Maximum A Posteriori (MAP) criterion that estimate the parameters of the linear transforms. We integrate the discriminative linear transforms into the MMI estimation of the HMM parameters in an attempt to capture the correlation between the feature vector components. The transforms obtained under the MMI criterion are termed Discriminative Likelihood Linear Transforms (DLLT). Experimental results show that DLLT provides a discriminative estimation framework for feature normalization in HMM training for large vocabulary continuous speech recognition tasks that outperforms its Maximum Likelihood counterpart. Then, we propose a structural MAP estimation framework
Generation and Combination of Complementary Systems for Automatic Speech Recognition
, 2008
"... It has been found that using a combination of systems for large vocabulary continuous speech recognition (LVCSR) can outperform the use of a single system. For the combination to yield gains, the individual models must be complementary, i.e. they must make different errors. Previous work in ASR has ..."
Abstract
 Add to MetaCart
(Show Context)
It has been found that using a combination of systems for large vocabulary continuous speech recognition (LVCSR) can outperform the use of a single system. For the combination to yield gains, the individual models must be complementary, i.e. they must make different errors. Previous work in ASR has mainly relied on an adhoc approach to finding complementary systems. Multiple systems are built, and those that perform well in combination are selected. The multiple diverse systems can be built in many ways, including the use of different frontends, injecting randomness, altering the model topology or using different training
Discriminative Optimisation Of Large Vocabulary Recognition Systems
, 1996
"... This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large vocabulary recognition system using the Maximum Mutual Information Estimation (MMIE) criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segme ..."
Abstract
 Add to MetaCart
This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large vocabulary recognition system using the Maximum Mutual Information Estimation (MMIE) criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between number of parameters and available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using the full SI284 training set. These show that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, experimental results demonstrate that MMIE optimisation of system structure and parameters can yield useful increases in recognition accuracy.