Results 1 - 10
of
12
Large Scale Discriminative Training For Speech Recognition
, 2000
"... This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). The paper concentrates on the maximum mutual information estimation (MMIE) criterion whi ..."
Abstract
-
Cited by 58 (5 self)
- Add to MetaCart
This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). The paper concentrates on the maximum mutual information estimation (MMIE) criterion which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware, and have led to significant reductions in word error rate for both triphone and quinphone HMMs compared to our best models trained using maximum likelihood estimation. The MMIE latticebased implementation used; techniques for ensuring improved generalisation; and interactions with maximum likelihood based adaptation are all discussed. Furthermore several variations to the MMIE training scheme are introduced with the a...
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Discriminative Training Of HMM Stream Exponents For Audio-Visual Speech Recognition
- Proc. Int. Conf. Acoust. Speech Signal Process
, 2000
"... We propose the use of discriminativetrainingbymeansof the generalized probabilistic descent #GPD# algorithm to estimate hidden Markov model #HMM# stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only si ..."
Abstract
-
Cited by 31 (15 self)
- Add to MetaCart
We propose the use of discriminativetrainingbymeansof the generalized probabilistic descent #GPD# algorithm to estimate hidden Markov model #HMM# stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtained by combining the two single-stream HMMs and introducing exponents that weigh the log-likelihood of each stream. We present the GPD algorithm for stream exponent estimation, consider a possible initialization, and apply it to the single speaker connected letters task of the AT&T bimodal database. We demonstrate the superior performance of the resulting multi-stream HMM to the audio-only, visual-only, and audio-visual single-stream HMMs. 1. INTRODUCTION Recently, there has been increasing interest in enhancing automatic speech recognition #ASR# by using, in addition to audio, visual information de...
Frame Discrimination Training Of HMMs For Large Vocabulary Speech Recognition
- Proc. ICASSP’99
, 1999
"... This paper describes the application of a discriminative HMM parameter estimation technique called Frame Discrimination (FD), to medium and large vocabulary continuous speech recognition. Previous work has shown that FD training can give better results than maximum mutual information (MMI) training ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
This paper describes the application of a discriminative HMM parameter estimation technique called Frame Discrimination (FD), to medium and large vocabulary continuous speech recognition. Previous work has shown that FD training can give better results than maximum mutual information (MMI) training for small tasks. The use of FD for much larger tasks required the development of a technique to be able to rapidly find the most likely set of Gaussians for each frame in the system. Experiments on the Resource Management and North American Business tasks show that FD training can give comparable improvements to MMI, but is less computationally intensive. 1. INTRODUCTION Previous research has shown that the accuracy of a speech recognition system trained using Maximum Likelihood Estimation (MLE) can often be improved further using discriminative training. All such techniques normally give much greater improvements in recognition accuracy on the training data than on the test set except wh...
Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance
- COMPUTER SCIENCE DEPARTMENT, CARNEGIE MELLON UNIVERSITY. HTTP://WWW.CS.CMU.EDU/~MSIEGLER/PUBLISH/PHD/THESIS.PS.GZ SINGHAL
, 1999
"... Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study th ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequenc...
Discriminative Training For Continuous Speech Recognition
- Proc. 1995 Europ. Conf. on Speech Communication and Technology
, 1995
"... Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques. 1. INTRODUCTION Recently discriminative training techniques for Hidden- Markov Models (HMM) were used successfully for automatic speech recognition. They provide better performance compared to Maximum Likelihood Estimation (MLE), since the training is concentrated on the estimation of class boundaries and not on parameters of assumed model distributions [1,12]. Although MLE and discriminative training are theoretically equivalent (if su...
Interdependence Of Language Models And Discriminative Training
, 1999
"... In this paper, the interdependence of language models and discriminative training for large vocabulary speech recognition is investigated. In addition, a constrained recognition approach using word graphs is presented for the efficient determination of alternative word sequences for discriminative t ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper, the interdependence of language models and discriminative training for large vocabulary speech recognition is investigated. In addition, a constrained recognition approach using word graphs is presented for the efficient determination of alternative word sequences for discriminative training. Experiments have been carried out on the ARPA Wall Street Journal corpus. The recognition results for MMI training show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant correlation has been observed between the language model choice for training and recognition.
Discriminative Mixture Weight Estimation For Large Gaussian Mixture Models
, 1999
"... This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. The estimation problem is casted in a neural network framework, which enables the incorporation of the appropriate constraints on the mixture weight vectors, and allows a straight-forward training procedure, based on steepest descent. Experiments conducted on the Callhome-English and Switchboard databases show a significant improvement of the acoustic model performance, and a somewhat lesser improvement with the combined acoustic and language models. 1. INTRODUCTION Many factors contribute to the relatively high error rates observed in LVCSR systems (e.g. diversity of speaking styles, pronunciation variants, variable degrees of articulation, noises, channel effects). By enlarging the set ...
Dictionary-Based Discriminative HMM Parameter Estimation For Continuous Speech Recognition Systems
- Proc. IEEE-ICASSP
, 1997
"... The estimation of the HMM parameters has always been a major issue in the design of speech recognition systems. Discriminative objectives like Maximum Mutual Information (MMI) or Minimum Classification Error (MCE) have proved to be superior over the common Maximum Likelihood Estimation (MLE) in case ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The estimation of the HMM parameters has always been a major issue in the design of speech recognition systems. Discriminative objectives like Maximum Mutual Information (MMI) or Minimum Classification Error (MCE) have proved to be superior over the common Maximum Likelihood Estimation (MLE) in cases where a robust estimation of the probabilistic density functions (pdfs) is not possible. The determination of the overall likelihood of an acoustic observation is the most crucial point of the MMI-parameter estimation when applied to continuous speech systems. Contrary to the common approaches that estimate the overall likelihood of the training observations by evaluating the most confusing sentences or by applying global state frequencies, this paper suggests to perform a dictionary analysis in order to get estimates for the dictionary-based risk of mixing up each two HMM states. These estimates are used to estimate the observations' likelihood and to control the discriminative MMI traini...
Optimization Of Sub-Band Weights Using Simulated Noisy Speech In Multi-Band Speech Recognition
- In: Proc. ICSLP
, 2000
"... Recently multi-band speech recognition has been proposed to improve robustness under environmental noises. One important issue is how to combine decisions from individual sub-band recognizers to arrive at a nal decision. Under the hidden Markov modeling (HMM) framework, one common approach is combin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Recently multi-band speech recognition has been proposed to improve robustness under environmental noises. One important issue is how to combine decisions from individual sub-band recognizers to arrive at a nal decision. Under the hidden Markov modeling (HMM) framework, one common approach is combining sub-band likelihoods linearly in an optimal manner so that the more reliable sub-bands are emphasized and the corrupted sub-bands are de-emphasized. In our experience, estimating the weights from clean speech is not eective as the weights are not optimal under noisy environments. In this paper, we derive the optimal weights from simulated noisy speech using discriminative training method with minimum classi cation errors (MCE) or maximum mutual information (MMI) as the cost function. The methods are evaluated on recognition of isolated TI digits. Compared with full-band recognition with noises at an SNR of 0dB, multiband recognition with MCE-derived weights reduces word errors by 45.9%...

