Results 1 -
7 of
7
Minimum classification error training of landmark models for real-time continuous speech recognition
- in Proc. IEEE ICASSP
, 2004
"... Though many studies have shown the effectiveness of the Minimum Classification Error (MCE) approach to discriminative training of HMMs for speech recognition, few if any have reported MCE results for large (> 100 hours) training sets in the context of real-world, continuous speech recognition. Here ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Though many studies have shown the effectiveness of the Minimum Classification Error (MCE) approach to discriminative training of HMMs for speech recognition, few if any have reported MCE results for large (> 100 hours) training sets in the context of real-world, continuous speech recognition. Here we report large gains in performance for the MIT JUPITER weather information task as a result of MCE-based batch optimization of acoustic models. Investigation of word error rate vs. computation time showed that small MCE models significantly outperform the Maximum Likelihood (ML) baseline at all points of equal computation time, resulting in up to 20 % word error rate reduction for in-vocabulary utterances. The overall MCE loss function was minimized using Quickprop, a simple but effective second-order optimization method suited to parallelization over large training sets.
A Parzen Window Based Derivation Of Minimum Classification Error From The Theoretical Bayes Classification Risk
- Proc. ICSLP
, 2002
"... This article shows that the Minimum Classification Error (MCE) criterion function commonly used for discriminative design of speech recognition systems is equivalent to a Parzen window based estimate of the theoretical Bayes classification risk. In this analysis, each training token is mapped to the ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This article shows that the Minimum Classification Error (MCE) criterion function commonly used for discriminative design of speech recognition systems is equivalent to a Parzen window based estimate of the theoretical Bayes classification risk. In this analysis, each training token is mapped to the center of a Parzen kernel in the domain of a suitably defined random variable. The kernels are summed to produce a density estimate; this estimate in turn can easily be integrated over the domain of incorrect classifications, yielding the risk estimate. The expression of risk for each kernel can be seen to correspond directly to the usual MCE loss function. The resulting risk estimate can be minimized by suitable adaptation of the recognition system parameters that determine the mapping from training token to kernel center. This analysis provides a novel link between the MCE empirical cost measured on a finite training set and the theoretical Bayes classification risk.
Feature-Based Pronunciation Modeling for Automatic Speech Recognition
- In Proc. HLT/NAACL
, 2005
"... Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to handling this variation consists of expanding the dictionary with phonetic substitution, insertion, and deletion rules. Common rule sets, however, typically leave many pronunciation variants unaccounted for and increase word confusability due to the coarse granularity of phone units. We present an alternative approach, in which many types of variation are explained by representing a pronunciation as multiple streams of linguistic features rather than a single stream of phones. Features may correspond to the positions of the speech articulators, such as the lips and tongue, or to acoustic or perceptual categories. By
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
"... Abstract—The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech reco ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT JUPITER telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7 % to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues. Index Terms—Discriminative training, pattern recognition, speech recognition. I.
Improved Name-Recognition with Meta-data dependent name networks
, 2004
"... A transcription system that requires accurate general name transcription is faced with the problem of covering the large number of names it may encounter. Without any prior knowledge, this requires a large increase in the size and complexity of the system due to the expansion of the lexicon. Further ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A transcription system that requires accurate general name transcription is faced with the problem of covering the large number of names it may encounter. Without any prior knowledge, this requires a large increase in the size and complexity of the system due to the expansion of the lexicon. Furthermore, this increase will adversely affect the system performance due to the increased confusability. Here we propose a method that uses meta-data, available at runtime to ensure better name coverage without significantly increasing the system complexity. We tested this approach on a voicemail transcription task and assumed meta-data to be available in the form of a caller ID string (as it would show up on a caller ID enabled phone) and the name of the mailbox owner. Networks representing possible spoken realization of those names are generated at runtime and included in network of the decoder. The decoder network is built at training time using a class-dependent language model, with caller and mailbox name instances modeled as class tokens. The class tokens are replaced at test time with the name networks built from the meta-data. The proposed algorithm showed a reduction in the error rate of name tokens of 22.1%. 1.
A New Formalization Of Minimum Classification Error Using A Parzen Estimate Of Classification Chance
, 2003
"... In recent work, we showed that the Minimum Classification Error (MCE) criterion function commonly used for discriminative design of pattern recognition systems is equivalent to a Parzen window based estimate of the theoretical classification risk. In this analysis, each training token is mapped to t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In recent work, we showed that the Minimum Classification Error (MCE) criterion function commonly used for discriminative design of pattern recognition systems is equivalent to a Parzen window based estimate of the theoretical classification risk. In this analysis, each training token is mapped to the center of a Parzen kernel in the domain of a suitably defined random variable; the kernels are then summed and integrated over the domain of incorrect classifications, yielding the risk estimate. Here, we deepen this approach by applying Parzen estimation at an earlier stage of the overall definition of classification risk. Specifically, the new analysis uses all incorrect categories, not just the single best incorrect category, in deriving a "correctness" function that is a simple multiple integral of a Parzen kernel over the hyperregion of correct classifications. The width of the Parzen kernel determines how many competing categories to use in optimizing the resulting overall risk estimate. This analysis rigorously formalizes the notion that using multiple competing categories in discriminative training is a type of smoothing that enhances generalization to unseen data.
SPEECH AND
, 2008
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
- Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright Author's personal copy Available online at www.sciencedirect.com

