## Confidence Measures From Local Posterior Probability Estimates (1999)

### Cached

### Download Links

Venue: | Computer Speech and Language |

Citations: | 27 - 7 self |

### BibTeX

@ARTICLE{Williams99confidencemeasures,

author = {Gethin Williams and Steve Renals},

title = {Confidence Measures From Local Posterior Probability Estimates},

journal = {Computer Speech and Language},

year = {1999},

volume = {13},

pages = {395--411}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attractive as they may be applied at the state-, phone-, word- or utterance-levels, potentially enabling discrimination between different causes of low confidence recognizer output, such as unclear acoustics or mismatched pronunciation models. We have evaluated these confidence measures for utterance verification using a number of different metrics. Experiments reveal several trends in `profitability of rejection', as measured by the unconditional error rate of a hypothesis test. These trends suggest that crude pronunciation models can mask the relatively subtle reductions in confidence caused by out-of-vocabulary (OOV) words and disfluencies, but not the gross model mismatches elicited by non-sp...

### Citations

266 | The DET curve in assessment of detection task performance
- Martin, Doddington, et al.
- 1997
(Show Context)
Citation Context ...ributions are assumed to be Gaussian then the probabilities P(type I error) and P(type II error) may be plotted over the range of operating points of a test as a detection error tradeoff (DET) curve (=-=Martin et al., 1997-=-). In this case, the axes are warped according to the deviations of the tails, corresponding to the probabilities, from the mean of the Gaussian. This logarithmic warping of the axes has the effect of... |

239 |
Signal detection theory and ROC analysis
- Egan
- 1975
(Show Context)
Citation Context ... of the two states of nature (true(H 0 ) and false(H 0 )). One method for plotting conditional probability statistics for a hypothesis test is to use an ROC (Receiver Operating Characteristic) curve (=-=Egan, 1975-=-). Such a curve is created by plotting the `hit' rates (ordinates) against the `false alarm' rates (abscissas) over the range of possible operating points on the test statistic. For example, an ROC cu... |

202 | An Application of Recurrent Nets to Phone Probability Estimation
- Robinson
(Show Context)
Citation Context ... P(q n jq n\Gamma1 ; X ; Q) : (14) This acoustic model probability can be estimated by an artificial neural network such as a multilayer perceptron (Bourlard and Morgan, 1994) or a recurrent network (=-=Robinson, 1994-=-), making an assumption about the dependence on the acoustic input. In the case of the recurrent network used in this work, we assume no dependence on the previous state or future acoustics (Robinson ... |

198 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ...s testing. The separability of the two distributions may be estimated using a number of metrics, two of which are the Kolmogorov Variational Distance, d Kol , and the Bhattacharyya Distance, d Bhatt (=-=Hand, 1997-=-). Another is the `symmetric Kullback-Leibler distance', d KL2 . Since the Kullback-Leibler distance between two distributions A and B is asymmetric, the symmetric version sums the divergence of the d... |

171 | Maximum mutual information estimation of hidden Markov model parameter for speech recognition - Bahl, Brown, et al. - 1986 |

70 | Global optimization of a neural network–Hidden Markov Model hybrid - Bengio, Mori, et al. - 1992 |

67 | Connectionists probability estimators in HMM speech recogntion
- Renals, Morgan, et al.
- 1994
(Show Context)
Citation Context ...led Likelihood The acceptor HMM system of section 3 is regarded as a "pseudo-generative" model, in which the likelihoods of a generative model are replaced by likelihood ratios or scaled lik=-=elihoodss(Renals et al., 1994-=-), which following (19) may be obtained by dividing the local posterior probability by the class prior estimated from the relative frequencies of the phone labels in the acoustic training data: p(X n ... |

48 |
Confidence Measures for the Switchboard Database
- Cox, Rose
- 1996
(Show Context)
Citation Context ...qual values can be obtained for a particular level of hypothesis testing performance, irrespective of the task difficulty. This normalized mutual information metric E(Z;A) is known as the efficiency (=-=Cox and Rose, 1996-=-) of a test: E(Z;A) = I(Z;A) H(A) = H(A) \Gamma H(AjZ) H(A) = H(Z) \Gamma H(ZjA) H(A) : (5) The above evaluation metrics result in a set of curves covering a range of operating points for a hypothesis... |

48 | Neural network based measures of confidence for word recognition
- Weintraub, Beaufays, et al.
- 1997
(Show Context)
Citation Context ...put is correct or incorrect, the output was aligned with the transcript. In addition to considering errors due to substitutions and insertions, poor time alignment was also considered to be an error (=-=Weintraub et al., 1997-=-). Specifically, for a segment of the recognition output to be considered well time aligned, an identical reference segment was required with greater than 50% of its duration overlapping with that of ... |

45 |
The use of recurrent networks in continuous speech recognition
- Robinson, Hochberg, et al.
- 1996
(Show Context)
Citation Context ...fidence measure derived from both the acoustic and language models. The confidence measures have been applied to the output of the ABBOT large vocabulary continuous speech recognition (LVCSR) system (=-=Robinson et al., 1996-=-) for the task of utterance verification at the word- and phone-levels. Several probabilistic metrics were used for evaluation. In addition to their computational efficiency, an attractive property of... |

39 |
Optimizing recognition and rejection performance in wordspotting systems
- Bourlard, D’hoore, et al.
- 1994
(Show Context)
Citation Context ...caled likelihood of a phone hypothesis q k : SL(q k ) = n e n=n s log ae P(q k jX n 1 ) P(q k ) oe = PP(q k ) \Gamma DlogfP(q k )g : (22) Online Garbage The term "online garbage" (Boite et a=-=l., 1993; Bourlard et al., 1994-=-) is used to refer to the normalization of the probability of the best decoding hypothesis by the average probability of the m-best decoding hypotheses. This average may be considered to be a form of ... |

33 |
Connectionist Speech Recognition-A Hybrid Approach
- Bourlard, Morgan
- 1994
(Show Context)
Citation Context ...q n jq 1 ; : : : ; q n\Gamma1 ; X ; Q) (13) ' N n=1 P(q n jq n\Gamma1 ; X ; Q) : (14) This acoustic model probability can be estimated by an artificial neural network such as a multilayer perceptron (=-=Bourlard and Morgan, 1994-=-) or a recurrent network (Robinson, 1994), making an assumption about the dependence on the acoustic input. In the case of the recurrent network used in this work, we assume no dependence on the previ... |

32 | Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems - Hennebert, Ris, et al. - 1951 |

29 | Confidence measures for hybrid HMM/ANN speech recognition - Williams, Renals - 1997 |

23 |
A probabilistic approach to confidence estimation and evaluation
- Gillick, Ito, et al.
- 1997
(Show Context)
Citation Context ...easure is an ideal candidate for a test statistic in some hypothesis test regarding the output of a speech recognizer. A more restrictive definition of a confidence measure (Weintraub et al., 7 1997; =-=Gillick et al., 1997) is the p-=-osterior probability of word correctness given a set of "confidence indicators " for the recognizer output, such as acoustic and language model probabilities, the duration of the word hypoth... |

17 | An overview of the SPRACH system for the transcription of broadcast news
- Cook, Christie, et al.
- 1999
(Show Context)
Citation Context ...pproximately 39 hours of the Hub-4 1996 training set, were merged with the output of a 4000 hidden unit multilayer perceptron, trained on the same data using modulation-filtered spectrogram features (=-=Cook et al., 1999-=-). A backed-off trigram language model was used, trained on the 200 million word NAB text corpus in the NAB case and on the 132 million word BN text corpus for the BN system. Vocabularies of 60022 and... |

16 |
A new approach towards keyword spotting
- Boite, Bourlard, et al.
- 1993
(Show Context)
Citation Context ... SL(q k ), the log scaled likelihood of a phone hypothesis q k : SL(q k ) = n e n=n s log ae P(q k jX n 1 ) P(q k ) oe = PP(q k ) \Gamma DlogfP(q k )g : (22) Online Garbage The term "online garba=-=ge" (Boite et al., 1993-=-; Bourlard et al., 1994) is used to refer to the normalization of the probability of the best decoding hypothesis by the average probability of the m-best decoding hypotheses. This average may be cons... |

15 |
Improving posterior confidence measures in hybrid HMM/ANN speech recognition system
- Bernardis, Bourlard
- 1998
(Show Context)
Citation Context ...ed nPP(q k ): nPP(q k ) = 1 D PP(q k ) : (25) SL(q k ), PP(q k ) and OLG(q k ) may be extended to the word-level by averaging their values over the phones that are constituent to the word hypotheses (=-=Bernardis and Bourlard, 1998-=-). S(n s ; n e ) may be derived at the word-level by simply matching the period over which it is calculated to the duration of the word hypothesis. We have also investigated a combined confidence meas... |

10 | New words: Effect on recognition performance and incorporation issues - Hetherington - 1995 |

8 | A training algorithm for statistical sequence recognition with applications to transition-based speech recognition - Bourlard, Konig, et al. - 1996 |