## Advances in Confidence Measures for Large Vocabulary (1999)

Venue: | in Proc. ICSLP |

Citations: | 8 - 0 self |

### BibTeX

@INPROCEEDINGS{Wendemuth99advancesin,

author = {A. Wendemuth and G. Rose and J. G. A. Dolfing},

title = {Advances in Confidence Measures for Large Vocabulary},

booktitle = {in Proc. ICSLP},

year = {1999},

pages = {70530--8}

}

### OpenURL

### Abstract

This paper adresses the correct choice and combination of confidence measures in large vocabulary speech recognition tasks. We classify single words within continuous as well as large vocabulary utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely misrecognized utterances or (less frequent) out-of-vocabulary (OOV).

### Citations

243 |
Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms
- Rosenblatt
- 1962
(Show Context)
Citation Context ...ote that the term in parentheses lies in the range (,1; 1). In the case of complete misclassification it approaches the values 1 which makes (4) exactly equivalent to conventional Perceptron learning =-=[9]-=-. Equating (4) to 0 is a fixedpoint equation for J which however cannot be solved analytically, justifying the Neural Network approach. 3 Fine-tuning the result Having trained the network in this Baye... |

70 | Estimating confidence using word lattices
- Kemp, Schaaf
- 1997
(Show Context)
Citation Context ...h process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are [3, 11], to the decoding process [5], and to language model and word graphs =-=[8, 10, 14]-=-. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated by [4, 8, 11, 12]. However, complex combination str... |

48 | Neural network based measures of confidence for word recognition
- Weintraub, Beaufays, et al.
- 1997
(Show Context)
Citation Context ...nd to language model and word graphs [8, 10, 14]. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated by =-=[4, 8, 11, 12]-=-. However, complex combination strategies do not significantly outperform simpler linear feature combinations [8]. In Section 2 and 3, we introduce the procedure to arrive at the best classification g... |

48 | R.Schluter, “Using word probabilities as confidence measures
- Wessel, Macherey
- 1998
(Show Context)
Citation Context ...h process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are [3, 11], to the decoding process [5], and to language model and word graphs =-=[8, 10, 14]-=-. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated by [4, 8, 11, 12]. However, complex combination str... |

47 |
Confidence Measures for the Switchboard database
- Cox, Rose
- 1996
(Show Context)
Citation Context ...ns related to the acoustic model, the search process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are [3, 11], to the decoding process =-=[5]-=-, and to language model and word graphs [8, 10, 14]. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated ... |

41 | Confidence Measures for Spontaneous Speech Recognition
- Schaaf, Kemp
- 1997
(Show Context)
Citation Context ... of confidence measure realizations related to the acoustic model, the search process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are =-=[3, 11]-=-, to the decoding process [5], and to language model and word graphs [8, 10, 14]. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the deci... |

37 |
Optimizing recognition and rejection performance in wordspotting systems
- Bourlard, D’Hoore, et al.
- 1994
(Show Context)
Citation Context ... of confidence measure realizations related to the acoustic model, the search process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are =-=[3, 11]-=-, to the decoding process [5], and to language model and word graphs [8, 10, 14]. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the deci... |

35 |
Learning algorithms and probability distributions in feedforward and feed-back networks
- Hopfield
- 1987
(Show Context)
Citation Context ... posterior distribution, we now look at a suitable error function that will be minimized. Following standard arguments [2], for binary classifications we minimize over all samples i the Cross Entropy =-=[7]-=- E = ,Xfci log(yi) +(1, ci)log(1, yi)g: (3) i We find a J that minimizes (3) if we apply a stochastic sequence of additive modifications J. To this end, we choose a constant and, at each step, we choo... |

15 |
Obtaining confidence measures from sentence probabilities
- Rueber
- 1997
(Show Context)
Citation Context ...h process and the language model exist in the literature. Examples of confidence measures applied to the acoustic model are [3, 11], to the decoding process [5], and to language model and word graphs =-=[8, 10, 14]-=-. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated by [4, 8, 11, 12]. However, complex combination str... |

10 |
Modelling and Decoding of Crossword Context Dependent
- Beyerlein, Ullrich, et al.
- 1997
(Show Context)
Citation Context ...ocabulary. The training of the triphone models was carried out gender dependently on the WSJ0+1 corpus. The Philips system for large vocabulary continuous speech recognition used here is described in =-=[1]-=-. The classification error rate (CER), which is the number of correctly tagged words divided by the total number of words, is used to compare results. In our experiments, we employ five basic confiden... |

8 | Combination of Confidence Measures in Isolated Word Recognition
- Dolfing, Wendemuth
- 1998
(Show Context)
Citation Context ...ll present and compare results obtained both for small vocabulary and large vocabulary tasks. The experimental setup for both scenarios is described. Data for the small vocabulary task are taken from =-=[6]-=- and are used here for comparison. The employed database for small vocabulary command-andcontrol contains single word utterances by 50 individuals (25 male, 25 female) who each spoke four to six utter... |

7 | On-line garbage modeling with discriminant analysis for utterance verification
- Caminero, Torre, et al.
(Show Context)
Citation Context ...nd to language model and word graphs [8, 10, 14]. It is possible to combine several confidence measures of the same and/or neighboring word hypotheses to solve the decision problem as demonstrated by =-=[4, 8, 11, 12]-=-. However, complex combination strategies do not significantly outperform simpler linear feature combinations [8]. In Section 2 and 3, we introduce the procedure to arrive at the best classification g... |

6 |
Learning the unlearnable
- Wendemuth
- 1995
(Show Context)
Citation Context ...alized ones [2] can be considered. This is outside the scope of this paper. Instead, we fine-tuned our result for J at the decision boundary. To this end, an algorithm developed by one of the authors =-=[13] w-=-as used to include further data into the set of correctly classified patterns. The Gardner–Derrida error function in [13], measuring the number of correctly classified data, is maximized. By doing s... |

2 |
Neural Networks for pattern recog
- Bishop
- 1995
(Show Context)
Citation Context ... Bayes posterior decision boundary following from P (CjX). The following shows under which conditions the Bayes posterior distribution can be modelled as a function of a. Some of the outline follows [=-=2]-=-. Starting with Bayes theorem, it can be seen as follows that the Bayes posterior can be written in the sigmoid form y = P (c =1jX) =g(a 0 ) def 1 == 1+e ,a0 (1) with a 0 p(Xjc =1)P (c =1) =ln (2) p(X... |