Results 1  10
of
81
Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks
, 2000
"... We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding ..."
Abstract

Cited by 228 (15 self)
 Add to MetaCart
We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used performance metric, word error rate (WER). We describe a method for explicitly minimizing WER by extracting word hypotheses with the highest posterior probabilities from word lattices. We change the standard problem formulation by replacing global search over a large set of sentence hypotheses with local search over a small set of word candidates. In addition to improving the accuracy of the recognizer, our method produces a new representation of the set of candidate hypotheses that specifies ...
Partially observable markov decision processes with continuous observations for dialogue management
 Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract

Cited by 204 (49 self)
 Add to MetaCart
(Show Context)
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Confidence Estimation for Machine Translation
 IN M. ROLLINS (ED.), MENTAL IMAGERY
, 2004
"... ..."
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract

Cited by 60 (8 self)
 Add to MetaCart
(Show Context)
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree searchbased restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Schlüter: ‘Using Word Probabilities as Confidence Measures
 in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1998
"... Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we pr ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
(Show Context)
Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forwardbackward algorithm. We present experimental results on the NORTH AMERICAN BUSINESS (NAB’94) and the German VERBMOBIL recognition task. 1.
Unsupervised Training Of A Speech Recognizer: Recent Experiments
 in Proc. EUROSPEECH
"... Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer with only a minimal amount (30 minutes) of transcripti ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
(Show Context)
Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer with only a minimal amount (30 minutes) of transcriptions and a large portion (50 hours) of untranscribed data. A recognizer is bootstrapped on the transcribed part of the data and initial transcripts are generated with it for the remainder (the untranscribed part). Using a latticebased confidence measure, the recognition errors are (partially) detected and the remainder of the hypotheses is used for training. Using this scheme, the word error rate on a broadcast news speech recognition task dropped from more than 32.0% to 21.4%. In a cheating experiment we show, that this performance cannot be significantly improved by improving the measure of confidence. By combining the unsupervisedly trained system with our currently best recognizer which ...
Confidence measures for speech recognition: A survey
, 2005
"... In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. A good confidence measure can largely benefit speech recognition systems in many practical applications. In this survey, I summarize most research works related to confidence measures which have ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. A good confidence measure can largely benefit speech recognition systems in many practical applications. In this survey, I summarize most research works related to confidence measures which have been done during the past 10–12 years. I will present all these approaches as three major categories, namely CM as a combination of predictor features, CM as a posterior probability, and CM as utterance verification. Then, I also introduce some recent advances in the area. Moreover, I will discuss capabilities and limitations of the current CM techniques and generally comment on todayÕs CM approaches. Based on the discussion, I will conclude the paper with some clues for future works.
A Comparison Of Word Graph And NBest List Based Confidence Measures
 in Proc. EUROSPEECH
, 1999
"... In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and Nbest lists clearly outperform nonprobabilistic confidence measures, e.g. the acoustic stability and the hypoth ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and Nbest lists clearly outperform nonprobabilistic confidence measures, e.g. the acoustic stability and the hypothesis density. In addition, we prove that the estimation of posterior word probabilities on word graphs yields better results than their estimation on Nbest lists and discuss both methods in detail. We present experimental results on three different corpora, the English NAB '94 20k development corpus, the German VERBMOBIL '96 evaluation corpus and a Dutch corpus, which has been recorded with a train timetable information system in the ARISE project. 1. INTRODUCTION In previous studies, the combination of several confidence features was investigated. These features were collected during the acoustic decoding process, e.g. [1] or were extracted from Nbest lists and word graphs, e.g. [2, 5]. ...
A Boosting Approach for Confidence Scoring
, 2001
"... In this paper we present the application of a boosting classification algorithm to confidence scoring. We derive feature vectors from speech recognition lattices and feed them into a boosting classifier. This classifier combines hundreds of very simple `weak learners' and derives classification ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present the application of a boosting classification algorithm to confidence scoring. We derive feature vectors from speech recognition lattices and feed them into a boosting classifier. This classifier combines hundreds of very simple `weak learners' and derives classification rules that can reduce the confidence error rate by up to 34%. We compare our results to those obtained using two other standard classification techniques, Support Vector Machines (SVMs) and Classification and Regression Trees (CART), and show significant improvements. Furthermore, the nature of the boosting algorithm allows us to combine the best single classifier and improve its performance.
LatticeBased Unsupervised MLLR For Speaker Adaptation
 Proc. ISCA ITRW ASR2000
, 2000
"... In this paper we explore the use of latticebased information for unsupervised speaker adaptation. As initially formulated, maximum likelihood linear regression (MLLR) aims to linearly transform the means of the gaussian models in order to maximize the likelihood of the adaptation data given the cor ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
In this paper we explore the use of latticebased information for unsupervised speaker adaptation. As initially formulated, maximum likelihood linear regression (MLLR) aims to linearly transform the means of the gaussian models in order to maximize the likelihood of the adaptation data given the correct hypothesis (supervised MLLR) or the decoded hypothesis (unsupervised MLLR). For the latter, if the firstpass decoded hypothesis is extremely erroneous (as it is the case for large vocabulary telephony applications) MLLR will often find a transform that increases the likelihood for the incorrect models, and may even lower the likelihood of the correct hypothesis. Since the oracle word error rate of a lattice is much lower than that of the 1best or Nbest hypotheses, by performing adaptation against a word lattice, the correct models are more likely to be used in estimating the transform. Furthermore, the particular MAP lattice that we propose enables the use of a natural confidence mea...