Results 1 - 10
of
16
Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks
, 2000
"... We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding ..."
Abstract
-
Cited by 115 (14 self)
- Add to MetaCart
We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used performance metric, word error rate (WER). We describe a method for explicitly minimizing WER by extracting word hypotheses with the highest posterior probabilities from word lattices. We change the standard problem formulation by replacing global search over a large set of sentence hypotheses with local search over a small set of word candidates. In addition to improving the accuracy of the recognizer, our method produces a new representation of the set of candidate hypotheses that specifies ...
Confidence Measures for Large Vocabulary Continuous Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 2001
"... In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word gra ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word graphs using a forward-backward algorithm. We also study the estimation of posterior probabilities on N-best lists instead of word graphs and compare both algorithms in detail. In addition, we compare the posterior probabilities with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density. We present experimental results on five different corpora: the Dutch ARISE lk evaluation corpus, the German Verbmobil '98 7k evaluation corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 65k evaluation corpus. We show that the posterior probabilities computed on word graphs outperform all other confidence measures. The relative reduction in confidence error rate ranges between 19% and 35% compared to the baseline confidence error rate.
Recognition confidence scoring and its use in speech understanding systems
- Computer Speech and Language
, 2002
"... In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, a ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, and utterance levels. The scores are produced by extracting confidence features from the computation of the recognition hypotheses and processing these features using an accept/reject classifier for word and utterance hypotheses. The output of the confidence classifiers can then be incorporated into the parsing mechanism of the language understanding component. To evaluate the system, experiments were conducted using the JUPITER weather information system. Evaluation was performed at the understanding level using key-value pair concept error rate as the evaluation metric. When confidence scores were integrated into the understanding component of the system, the concept error rate was reduced by over 35%.
Using Word Probabilities As Confidence Measures
- in Proc. ICASSP
, 1998
"... Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we pr ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forward-backward algorithm. We present experimental results on the NORTH AMERICAN BUSINESS (NAB'94) and the German VERBMOBIL recognition task. 1. INTRODUCTION With the rising number of different application areas for speech ...
Posterior Probability Decoding, Confidence Estimation And System Combination
, 2000
"... In this paper the estimation of word posterior probabilities is discussed and their application in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech evaluation is described. The word lattices produced by the Viterbi decoder were used to generate confusion networks, which ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
In this paper the estimation of word posterior probabilities is discussed and their application in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech evaluation is described. The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities. These confusion networks were used in a number of post-processing steps. The 1-best sentence hypotheses extracted directly from the networks are shown to be significantly more accurate than the baseline decoding results. The posterior probability estimates were used as the basis for the estimation of word-level confidence scores. A new system combination technique is presented that uses these confidence scores and the confusion networks and performs better than the well-known ROVER technique.
Utilizing Untranscribed Training Data To Improve Performance
- DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne
, 1998
"... In the past few years, the Large Vocabulary Conversational Speech Recognition (LVCSR) community has attempted to address the problem of speech recognition on languages other than English. Work on the CallHome Corpora has verified that current technology is largely language independent, and that the ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In the past few years, the Large Vocabulary Conversational Speech Recognition (LVCSR) community has attempted to address the problem of speech recognition on languages other than English. Work on the CallHome Corpora has verified that current technology is largely language independent, and that the dominant factor with regards to performance on a certain language is the amount of training data available ([1]). This brings forth the question of what is the appropriate course of action when we need to quickly bring a recognizer up in a new language, were little or no training is available. This is exactly the question we will address in this paper. We will assume that, while only a couple of hours of transcribed data is available, much more untranscribed data can be found, and we will explore ways to utilize it. 1. INTRODUCTION In the past few years, the Large Vocabulary Conversational Speech Recognition (LVCSR) community has attempted to address the problem of speech recognition on la...
A Boosting Approach for Confidence Scoring
, 2001
"... In this paper we present the application of a boosting classification algorithm to confidence scoring. We derive feature vectors from speech recognition lattices and feed them into a boosting classifier. This classifier combines hundreds of very simple `weak learners' and derives classification rule ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In this paper we present the application of a boosting classification algorithm to confidence scoring. We derive feature vectors from speech recognition lattices and feed them into a boosting classifier. This classifier combines hundreds of very simple `weak learners' and derives classification rules that can reduce the confidence error rate by up to 34%. We compare our results to those obtained using two other standard classification techniques, Support Vector Machines (SVMs) and Classification and Regression Trees (CART), and show significant improvements. Furthermore, the nature of the boosting algorithm allows us to combine the best single classifier and improve its performance.
Estimating and Evaluating Confidence for Forensic Speaker Recognition," presented at ICASSP
, 2005
"... Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence es ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence estimation based upon scores and metainformation, such as utterance duration, channel type, and SNR. The framework uses regression techniques with multilayer perceptrons to estimate confidence with a data-driven methodology. As an application, we show the use of the framework in a speaker comparison task drawn from the NIST 2000 evaluation. A relative comparison of different types of meta-information is given. We demonstrate that the new framework can give substantial improvements over standard distribution methods of estimating confidence. 1.
On the Use of Quality Measures for Text-Independent Speaker Recognition
- IN THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP (ODYSSEY
, 2004
"... The use of quality information on automatic recognition systems is studied. From an apparent definition of what constitutes a quality measure, a framework for the successful exploitation of the quality information is derived. Potential applications are also introduced at different phases of the reco ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The use of quality information on automatic recognition systems is studied. From an apparent definition of what constitutes a quality measure, a framework for the successful exploitation of the quality information is derived. Potential applications are also introduced at different phases of the recognition process, namely: enrollment, scoring and multi-level fusion stages. Traditional likelihood scoring stage is further developed providing guidelines for the practical application of the proposed ideas. Preliminary experiments corroborate the benefits of the proposed quality-guided recognition approach. In particular, a frame-level quality measure meeting a goodness criterion based on deviation from the fundamental frequency is used, obtaining encouraging initial results.
Robust Confidence Annotation and Rejection for Continuous Speech Recognition
- in Proceedings of ICASSP
"... We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the Normalized Cross Entropy that is adapted to that purpose is introduced. It is success ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the Normalized Cross Entropy that is adapted to that purpose is introduced. It is successfully used to automatically select features and optimize the word-level confidence measure on several test sets. Sentence-level confidence geared toward the rejection of out-of-grammar utterances is also investigated. The combination of a word graph based technique and the acoustic score shows excellent performance across all the tasks we considered. 1.

