## Confidence Measures For Evaluating Pronunciation Models (1998)

### Cached

### Download Links

Citations: | 12 - 1 self |

### BibTeX

@MISC{Williams98confidencemeasures,

author = {Gethin Williams and Steve Renals},

title = {Confidence Measures For Evaluating Pronunciation Models},

year = {1998}

}

### OpenURL

### Abstract

this paper, we investigate the use of confidence measures for the evaluation of pronunciation models. The confidence measures and pronunciation models are obtained from the ABBOT hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) Large Vocabulary Continuous Speech Recognition (LVCSR) system [4] and the experiments were carried out using the North American Business News (NAB) and ARPA Hub4 Broadcast News (BN) corpora. 2. CONFIDENCE MEASURES AND PRONUNCIATION MODELS A confidence measure may be defined as a function which quantifies how well a model matches some acoustic data. More specifically, an acoustic confidence measure is one which is derived exclusively from an acoustic model. A pronunciation model for a word specifies how it is believed to be articulated in terms of a sequence of subword acoustic classes. The goal when evaluating a pronunciation model is for some word is to determine how well the model matches acoustic realisations of that word. Therefore, an acoustic confidence measure is naturally suited to the task. A common approach to evaluating pronunciation models, however, is to align the subword class sequence output by the recogniser, using full word level decoding constraints, against an alternative subword sequence obtained without any pronunciation model constraints. In this case, a poor pronunciation model is signalled by a portion of the alignment where the class labels do not match. This approach is undesirable for two reasons. Firstly, the alignment only signals pronunciation variants and does not give a direct measure of model match and secondly, obtaining an accurate alternative decoding sequence is difficult. One method for obtaining such a decoding sequence is to transcribe the acoustic data with subword class labels by hand, e.g. ...

### Citations

483 |
Connectionist Speech Recognition- A Hybrid Approach
- Bourlard, Morgan
- 1994
(Show Context)
Citation Context ... suited to producing acoustic confidence measures. This is because the acoustic model (ANN) can directly estimate acoustic subword class posterior probabilities which are comparable across utterances =-=[2]-=-. The ABBOT [8] acoustic model is trained to estimate phone class posterior probabilities, based on a local portion of acoustics of the utterance, p¢ q k £ x n ¤ . In this paper we make use of an acou... |

45 |
The use of recurrent networks in continuous speech recognition
- Robinson, Hochberg, et al.
- 1996
(Show Context)
Citation Context ...e confidence measures and pronunciation models are obtained from the ABBOT hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) Large Vocabulary Continuous Speech Recognition (LVCSR) system =-=[8]-=-. Experiments were carried out for a number of baseform learning schemes using the ARPA North American Business News (NAB) and the Broadcast News (BN) corpora from which it was found that a confidence... |

29 | Confidence measures for hybrid HMM/ANN speech recognition
- Williams, Renals
- 1997
(Show Context)
Citation Context ...terior phone probabilities, CMnpost. In previous studies, we have compared the performance of CMnpost to that of a number of other confidence measures for the task ofsdecoding hypothesis verification =-=[12, 13]-=-. We have found CMnpost to perform better than the other confidence measures for the task of phone hypothesis verification and to be the least expensive to compute. A description of CMnpost and four o... |

25 | Building Multiple Pronunciation Models for Novel Words using Exploratory
- Tajchman, Fosler, et al.
- 1995
(Show Context)
Citation Context ... acoustics (on average). Such a process requires the proposal of alternative pronunciation models. A number of methods for automatically generating an alternative pronunciation model for a word exist =-=[3, 4, 7, 11]-=-. An acoustic confidence measure may be employed to determine whether an alternative model is an improvement upon an existing model. 3. CONFIDENCE MEASURES An acoustic class model created using a ‘tra... |

24 |
Identification of contextual factors for pronunciation networks
- CHEN
- 1990
(Show Context)
Citation Context ... acoustics (on average). Such a process requires the proposal of alternative pronunciation models. A number of methods for automatically generating an alternative pronunciation model for a word exist =-=[3, 4, 7, 11]-=-. An acoustic confidence measure may be employed to determine whether an alternative model is an improvement upon an existing model. 3. CONFIDENCE MEASURES An acoustic class model created using a ‘tra... |

17 |
A New Approach Towards Keyword Spotting
- Boite, Bourlard, et al.
- 1993
(Show Context)
Citation Context ...n the search for the optimal state sequence. CM nsl¢ q k¤�¥ ¥ 1 ne ∑ n¨ ne ns § ns 1 ne ∑ n¨ ne ns § ns � log p¢ xn£q k¤ x p¢ n ¤�� � log P¢ qk£x n ¤ qk¤�� P¢ Online Garbage The term ’online garbage’ =-=[1]-=- refers to the normalisation of the probability of the best decoding hypothesis by the average probability of the n-best decoding hypotheses. This average may be considered to be a form of garbage mod... |

16 | Automatic generation of context-dependent pronunciations
- Ravishankar, Eskenazi
- 1997
(Show Context)
Citation Context ...nd [4]. This method is prohibitively labour intensive for large corpora, such as the BN corpus. Another method is to run the recogniser over the data using only phone level decoding constraints, e.g. =-=[7]-=-. Decoding sequences obtained using this method contain many errors, however (typically around 30% error rate for phone classification). Also such decoding sequences should not be attributed as much c... |

11 |
Utterance verification of keyword strings using word-based minimum verification error (WBMVE) training
- Sukkar, Setlur, et al.
(Show Context)
Citation Context ...iven the class model and an estimate of p¢ X¤ given by a ‘garbage’ or ‘filler’ model. The use of such likelihood ratios has been reported in the keyword spotting and utterance verification literature =-=[9, 10]-=-. A problem with the use of such likelihood ratios is that it is very difficult to explicitly estimate p¢ X¤ for a wide range of acoustic conditions. A second approach to deriving a confidence measure... |

10 |
New words: Effect on recognition performance and incorporation issues
- Hetherington
- 1995
(Show Context)
Citation Context ...eting decodings in an n-best lattice of decoding hypotheses and is computed by averaging the number of unique competing decoding hypothesis (NCH) which pass through a frame over the interval ns to ne =-=[5]-=-. CM lat ¢ ns� ne¤ is not an acoustic confidence measure as the n-best lattices of decoding hypotheses from which it is derived are created using both acoustic and language model information. p¢ q k¨ ... |

9 | Study of the Use and Evaluation of Confidence measures in Automatic Speech Recognition
- Williams
- 1998
(Show Context)
Citation Context ...terior phone probabilities, CMnpost. In previous studies, we have compared the performance of CMnpost to that of a number of other confidence measures for the task ofsdecoding hypothesis verification =-=[12, 13]-=-. We have found CMnpost to perform better than the other confidence measures for the task of phone hypothesis verification and to be the least expensive to compute. A description of CMnpost and four o... |

8 |
Automatic learning of word pronunciation from data
- Fosler, Weintraub, et al.
- 1996
(Show Context)
Citation Context ...and, secondly, obtaining an accurate alternative decoding sequence is difficult. One method for obtaining such a decoding sequence is to transcribe the acoustic data with subword class labels by hand =-=[4]-=-. This method is prohibitively labour intensive for large corpora, such as the BN corpus. Another method is to run the recogniser over the data using only phone level decoding constraints, e.g. [7]. D... |

3 |
Lexical Tuning based on Triphone Confident Estimation
- Markey, Ward
- 1997
(Show Context)
Citation Context ... for that class, calculated over some data set. A low confidence is given to the match of an acoustic class model if its associated likelihood falls sufficiently far from the mean of the distribution =-=[6]-=-. An objection to this approach is that it is somewhat ad hoc as it does not explicitly accommodate different acoustic conditions. In contrast to likelihood based recognisers, hybrid HMM/ANN systems a... |

3 |
Spotting from Continuous Speech Utterances
- Word
- 1996
(Show Context)
Citation Context ...iven the class model and an estimate of p¢ X¤ given by a ‘garbage’ or ‘filler’ model. The use of such likelihood ratios has been reported in the keyword spotting and utterance verification literature =-=[9, 10]-=-. A problem with the use of such likelihood ratios is that it is very difficult to explicitly estimate p¢ X¤ for a wide range of acoustic conditions. A second approach to deriving a confidence measure... |