## Maximum Entropy Confidence Estimation For Speech Recognition (2007)

Citations: | 11 - 2 self |

### BibTeX

@MISC{White07maximumentropy,

author = {Christopher White and Alex Acero and Julian Odell},

title = {Maximum Entropy Confidence Estimation For Speech Recognition},

year = {2007}

}

### OpenURL

### Abstract

For many automatic speech recognition (ASR) applications, it is useful to predict the likelihood that the recognized string contains an error. This paper explores two modifications of a classic design. First, it replaces the standard maximum likelihood classifier with a maximum entropy classifier. The maximum entropy framework carries the dual advantages discriminative training and reasonable generalization. Second, it includes a number of alternative features. Our ASR system is heavily pruned, and often produces recognition lattices with only a single path. These alternate features are meant to serve as a surrogate for the typical features that can be computed from a rich lattice. We show that the maximum entropy classifier easily outperforms the standard baseline system, and the alternative features provide consistent gains for all of our test sets.

### Citations

209 | Maximum Entropy Models for Natural Language Ambiguity Resolution
- Ratnaparkhi
- 1998
(Show Context)
Citation Context ...mum entropy (MaxEnt) criterion. Conditional maximum entropy models were chosen based on their history of good performance for speech and language related tasks including language modeling[15], parsing=-=[16]-=-, etc. They have been applied with mixed results to confidence estimation in information extraction[5] and machine translation[6]. Although MaxEnt models have been applied to estimating posterior phon... |

183 | Adaptive Statistical Language Modeling: A Maximum Entropy Approach
- Rosenfeld
- 1994
(Show Context)
Citation Context ...sing the maximum entropy (MaxEnt) criterion. Conditional maximum entropy models were chosen based on their history of good performance for speech and language related tasks including language modeling=-=[15]-=-, parsing[16], etc. They have been applied with mixed results to confidence estimation in information extraction[5] and machine translation[6]. Although MaxEnt models have been applied to estimating p... |

106 | Confidence measurement for large vocabulary continuous speech recognition
- Wessel
- 2001
(Show Context)
Citation Context ...lity of error using several observations taken from the recognition lattice emitted by the ASR engine. If a rich lattice is available, it can be renormalized to provide a good confidence estimate (CE)=-=[1, 2]-=-. Alternately, an ASR engine can produce many types of scores which are used as observations to train a statistical model. In addition to a typical ASR observation such as acoustic score, decoders may... |

83 | Confidence estimation for machine translation
- Blatz, Fitzgerald, et al.
- 2004
(Show Context)
Citation Context ...s[4]. The framework of observing events from an output (lattice or otherwise) to train a model for estimating confidence is also used in the fields of information extraction[5] and machine translation=-=[6, 7]-=-. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[... |

68 | Recognition confidence scoring and its use in speech understanding systems, Computer Speech and Language
- Hazen, Seneff, et al.
- 2002
(Show Context)
Citation Context ...cal model. In addition to a typical ASR observation such as acoustic score, decoders may produce a variety of observations based on the language model, articulatory observations[3] or discourse events=-=[4]-=-. The framework of observing events from an output (lattice or otherwise) to train a model for estimating confidence is also used in the fields of information extraction[5] and machine translation[6, ... |

60 | Confidence estimation for information extraction
- Culotta
- 2004
(Show Context)
Citation Context ...tions[3] or discourse events[4]. The framework of observing events from an output (lattice or otherwise) to train a model for estimating confidence is also used in the fields of information extraction=-=[5]-=- and machine translation[6, 7]. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum en... |

53 | Large vocabulary decoding and confidence estimation using word posterior probabilities
- Evermann, Woodland
- 2000
(Show Context)
Citation Context ...lity of error using several observations taken from the recognition lattice emitted by the ASR engine. If a rich lattice is available, it can be renormalized to provide a good confidence estimate (CE)=-=[1, 2]-=-. Alternately, an ASR engine can produce many types of scores which are used as observations to train a statistical model. In addition to a typical ASR observation such as acoustic score, decoders may... |

17 |
Confidence estimation for text prediction
- Gandrabur, Foster
- 2003
(Show Context)
Citation Context ...n mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[12], model combination[13], or a hybrid of these=-=[6, 7, 14]-=-. While the recent trend has been toward discriminative systems[11, 12], many systems still train a generative model based on observations pulled from a lattice[8, 3]. This paper details two improveme... |

16 |
Evaluation of word confidence for speech recognition systems”, Computer Speech and Language
- Siu, Gish
- 1999
(Show Context)
Citation Context ...nfidence is also used in the fields of information extraction[5] and machine translation[6, 7]. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)=-=[9]-=-, decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[12], model combination[13], or a hybrid of these[6, 7, 14]. While the recent trend has been toward discrimina... |

15 | 2004): Training a Sentence-Level Machine Translation Confidence Measure
- QUIRK
(Show Context)
Citation Context ...s[4]. The framework of observing events from an output (lattice or otherwise) to train a model for estimating confidence is also used in the fields of information extraction[5] and machine translation=-=[6, 7]-=-. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[... |

7 |
Bayesian model combination (baycom) for improved recognition
- Sankar
- 2005
(Show Context)
Citation Context ...ined have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[12], model combination=-=[13]-=-, or a hybrid of these[6, 7, 14]. While the recent trend has been toward discriminative systems[11, 12], many systems still train a generative model based on observations pulled from a lattice[8, 3]. ... |

5 |
Compensating for word posterior estimation bias in confusion networks
- Hillard, Ostendorf
- 2006
(Show Context)
Citation Context ...on extraction[5] and machine translation[6, 7]. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines=-=[11]-=-, maximum entropy (MaxEnt) trained models[12], model combination[13], or a hybrid of these[6, 7, 14]. While the recent trend has been toward discriminative systems[11, 12], many systems still train a ... |

1 | Articulatory-feature-based confidence measures,” Computer Speech and Language
- Leung, Sui
- 2005
(Show Context)
Citation Context ...ons to train a statistical model. In addition to a typical ASR observation such as acoustic score, decoders may produce a variety of observations based on the language model, articulatory observations=-=[3]-=- or discourse events[4]. The framework of observing events from an output (lattice or otherwise) to train a model for estimating confidence is also used in the fields of information extraction[5] and ... |

1 |
Bayesian confidence scoring and adaptation techniques for speech recognition
- Kim, Ko
(Show Context)
Citation Context ...) to train a model for estimating confidence is also used in the fields of information extraction[5] and machine translation[6, 7]. The models to be trained have included Gaussian mixture models (GMM)=-=[8]-=-, generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models[12], model combination[13], or a hybrid of these[6, 7, 14]. While the rec... |

1 |
Random forest-based confidence annotation using novel features from confusion networks
- Xue, Zhao
- 2006
(Show Context)
Citation Context ...ed in the fields of information extraction[5] and machine translation[6, 7]. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees=-=[10]-=-, support vector machines[11], maximum entropy (MaxEnt) trained models[12], model combination[13], or a hybrid of these[6, 7, 14]. While the recent trend has been toward discriminative systems[11, 12]... |

1 | Maximum entropy based normalization of word posteriors for phonetic and lvcsr lattice search
- Yu, Zhang, et al.
- 2006
(Show Context)
Citation Context ...]. The models to be trained have included Gaussian mixture models (GMM)[8], generalized linear models (GLM)[9], decision trees[10], support vector machines[11], maximum entropy (MaxEnt) trained models=-=[12]-=-, model combination[13], or a hybrid of these[6, 7, 14]. While the recent trend has been toward discriminative systems[11, 12], many systems still train a generative model based on observations pulled... |