Results 1  10
of
66
Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks
, 2000
"... We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding ..."
Abstract

Cited by 240 (14 self)
 Add to MetaCart
We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used performance metric, word error rate (WER). We describe a method for explicitly minimizing WER by extracting word hypotheses with the highest posterior probabilities from word lattices. We change the standard problem formulation by replacing global search over a large set of sentence hypotheses with local search over a small set of word candidates. In addition to improving the accuracy of the recognizer, our method produces a new representation of the set of candidate hypotheses that specifies ...
Minimum Bayesrisk decoding for statistical machine translation
 IN PROCEEDINGS OF HLTNAACL
, 2004
"... We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract

Cited by 174 (16 self)
 Add to MetaCart
(Show Context)
We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, wordtoword alignments from an MT system, and syntactic structure from parsetrees of source and target language sentences. We report the performance of the MBR decoders on a ChinesetoEnglish translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions.
PhraseBased Statistical Machine Translation
, 2002
"... This paper is based on the work carried out in the framework of the Verbmobil project, which is a limiteddomain speech translation task (GermanEnglish). In the nal evaluation, the statistical approach was found to perform best among ve competing approaches. In this ..."
Abstract

Cited by 141 (20 self)
 Add to MetaCart
This paper is based on the work carried out in the framework of the Verbmobil project, which is a limiteddomain speech translation task (GermanEnglish). In the nal evaluation, the statistical approach was found to perform best among ve competing approaches. In this
Segmental minimum Bayesrisk decoding for automatic speech recognition
 IEEE Transactions on Speech and Audio Processing
, 2003
"... Abstract—Minimum BayesRisk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum aposteriori probability (MAP) decoders through Nbest list rescoring and search over word lattices. We present a Segmental Minimum BayesRisk decoding (SMBR) framework that simpl ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
(Show Context)
Abstract—Minimum BayesRisk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum aposteriori probability (MAP) decoders through Nbest list rescoring and search over word lattices. We present a Segmental Minimum BayesRisk decoding (SMBR) framework that simplifies the implementation of MBR recognizers through the segmentation of the Nbest lists or lattices over which the recognition is to be performed. This paper presents lattice cutting procedures that underly SMBR decoding. Two of these procedures are based on a risk minimization criterion while a third one is guided by wordlevel confidence scores. In conjunction with SMBR decoding, these lattice segmentation procedures give consistent improvements in recognition word error rate (WER) on the Switchboard corpus. We also discuss an application of riskbased lattice cutting to multiplesystem SMBR decoding and show that it is related to other system combination techniques such as ROVER. This strategy combines lattices produced from multiple ASR systems and is found to give WER improvements in a Switchboard evaluation system. Index Terms—ASR system combination, extendedROVER, lattice cutting, minimum Bayesrisk decoding, segmental minimum
Support vector machines for segmental minimum bayes risk decoding of continuous speech
 In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU
, 2003
"... Segmental Minimum Bayes Risk (SMBR) Decoding involves the refinement of the search space into sequences of small sets of confusable words. We describe the application of Support Vector Machines (SVMs) as discriminative models for the refined search spaces. We show that SVMs, which in their basic for ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
(Show Context)
Segmental Minimum Bayes Risk (SMBR) Decoding involves the refinement of the search space into sequences of small sets of confusable words. We describe the application of Support Vector Machines (SVMs) as discriminative models for the refined search spaces. We show that SVMs, which in their basic formulation are binary classifiers of fixed dimensional observations, can be used for continuous speech recognition. We also study the use of GiniSVMs, which is a variant of the basic SVM. On a small vocabulary task, we show this two pass scheme outperforms MMI trained HMMs. Using system combination we also obtain further improvements over discriminatively trained HMMs. 1.
Error Corrective Mechanisms For Speech Recognition
, 2001
"... In the standard MAP approach to speech recognition, the goal is to find the word sequence with the highest posterior probability given the acoustic observation. Recently, a number of alternate approaches have been proposed for directly optimizing the word error rate, the most commonly used evaluati ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
In the standard MAP approach to speech recognition, the goal is to find the word sequence with the highest posterior probability given the acoustic observation. Recently, a number of alternate approaches have been proposed for directly optimizing the word error rate, the most commonly used evaluation criterion. One of them, the consensus decoding approach, converts a word lattice into a confusion network which specifies the wordlevel confusions at different time intervals, and outputs the word with the highest posterior probability from each word confusion set. This paper presents a method for discriminating between the correct and alternate hypotheses in a confusion set using additional knowledge sources extracted from the confusion networks. We use transformationbased learning for inducing a set of rules to guide a better decision between the top two candidates with the highest posterior probabilities in each confusion set. The choice of this learning method is motivated by the perspicuous representation of the rules induced, which can provide insight into the cause of the errors of a speech recognizer. In experiments on the Switchboard corpus, we show significant improvements over the consensus decoding approach.
Corrective Language Modeling For Large Vocabulary ASR With The Perceptron Algorithm
 PROC. ICASSP
, 2004
"... This paper investigates errorcorrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finitestate automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We pre ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
This paper investigates errorcorrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finitestate automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We present results for various training scenarios for the Switchboard task, including using ngram features of different orders, and performing nbest extraction versus using full word lattices. We demonstrate the importance of making the training conditions as close as possible to testing conditions. The best approach yields a 1.3 percent improvement in first pass accuracy, which translates to 0.5 percent improvement after other rescoring passes.
Fast consensus decoding over translation forests
 In The Annual Conference of the Association for Computational Linguistics
, 2009
"... The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score. However, MBR targeting BLEU is prohibitively slow to optimize over kbest lists for large k. In this paper, we introduce and analy ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score. However, MBR targeting BLEU is prohibitively slow to optimize over kbest lists for large k. In this paper, we introduce and analyze an alternative to MBR that is equally effective at improving performance, yet is asymptotically faster — running 80 times faster than MBR in experiments with 1000best lists. Furthermore, our fast decoding procedure can select output sentences based on distributions over entire forests of translations, in addition to kbest lists. We evaluate our procedure on translation forests from two largescale, stateoftheart hierarchical machine translation systems. Our forestbased decoding objective consistently outperforms kbest list MBR, giving improvements of up to 1.0 BLEU. 1
Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large . . .
 IN PROC. EUROSPEECH
, 2005
"... Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation proced ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation procedures are developed and evaluated for small vocabulary and large vocabulary recognition tasks, and additive performance improvements are shown relative to maximum mutual information estimation. These relative gains are explained through a detailed analysis of individual word recognition errors.