## Optimal acoustic and language model weights for minimizing word verification errors

Venue: | Proc. ICSLP2004 |

Citations: | 8 - 5 self |

### BibTeX

@INPROCEEDINGS{Soong_optimalacoustic,

author = {Frank K. Soong and Wai-kit Lo Satoshi},

title = {Optimal acoustic and language model weights for minimizing word verification errors},

booktitle = {Proc. ICSLP2004},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Generalized word posterior probability (GWPP), a confidence measure for verifying recognized words, needs to equalize and weight acoustic and language model likelihood contributions to minimize verification errors. In this study, we investigate the word verification error surface and use it to optimize these weights and the corresponding verification threshold in a development set. We test three different search algorithms for finding the optimal parameters, including: a full grid search, a gradient-based steepest descent search, and a downhill simplex search. The three search methods yield very similar solutions. Proper acoustic and language model weights, especially the ratio between them, changes with the relative importance (reliability) between the two knowledge sources. For a narrow beam width, the role of the acoustic model is less critical than language model in GWPP-based word verification, which is due to the noisy acoustic information maintained in a narrow beam. Using a large vocabulary continuous Japanese speech database (Basic Travel Expression Corpus), the largest relative improvement obtained is 33.2 % for confidence error rate and 38.7 % for a modified word accuracy. 1.

### Citations

1371 |
A simplex method for function minimization
- Nelder, Mead
- 1965
(Show Context)
Citation Context ...α and β can be estimated from a development set. This is an optimization task with two parameters, α and β, to obtain minimum word verification errors. α (a) Coarse Total error α (b) Fine Total error =-=[9]-=-. Given an initial simplex (a triangle in a 2-D search), it iteratively updates the vertices of the simplex in the direction towards the minimum error. This algorithm terminates when the difference be... |

78 |
Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversation in the Real World
- Takezawa, Sumita, et al.
- 2002
(Show Context)
Citation Context ... errors. 3.1. Speech corpus 3. Experimental Setups The corpus used in our experiments is a large vocabulary, continuous, read Japanese speech database called the Basic Travel Expression Corpus (BTEC) =-=[7]-=-. It was compiled and collected for a travel domain speech-to-speech translation project. In particular, two data sets are used as development and test sets in this study, consisting of 508 and 510 ut... |

70 | Estimating confidence using word lattices
- Kemp, Schaaf
- 1997
(Show Context)
Citation Context ... of speech recognition output. They can be roughly classified into three categories: i) feature based; ii) explicit (extra) model based; and iii) posterior probability based. Feature based approaches =-=[1]-=- try to assess the confidence according to selected features (e.g., word duration, part-of-speech, acoustic and language model back-off, word graph density, etc.) using some trained classifiers. Expli... |

30 | Discriminative utterance verification for connected digits recognition
- Lee, Juang
- 1997
(Show Context)
Citation Context ...f-speech, acoustic and language model back-off, word graph density, etc.) using some trained classifiers. Explicit model based approaches employ a candidate model together with some competitor models =-=[2,3]-=- (e.g., anti-model or filler model, etc.) and a likelihood ratio test is usually applied. Finally, posterior probability based approach tries to estimate the posterior probabilities of a recognized en... |

12 |
Spontaneous dialogue speech recognition using cross-word context constrained word graph
- Shimizu, Yamamoto, et al.
(Show Context)
Citation Context ...in this study, consisting of 508 and 510 utterances, respectively. Each set has 10 speakers (gender balanced) reading different sentences in the travel domain. 3.2. LVCSR The LVCSR used is the ATRASR =-=[8]-=-, running in multi-pass with a word bigram language model and a 47k word lexicon. Generated word graphs in the recognition process are then rescored using a word trigram language model to obtain the f... |

9 | Generalized word posterior probability (GWPP) for measuring reliability of recognized words
- Soong, Lo, et al.
(Show Context)
Citation Context ...ally applied. Finally, posterior probability based approach tries to estimate the posterior probabilities of a recognized entity (e.g., subword, word, or sentence) given all the acoustic observations =-=[4,5]-=-. In this study we generalize the concept of word posterior probability (WPP) to take into account the practical limitations in computing the WPP. Specifically, how to optimize the exponential weights... |

5 | Likelihood Ratio Decoding and Confidence Measures for Continuous Speech Recognition
- Lleida, Rose
- 1996
(Show Context)
Citation Context ...f-speech, acoustic and language model back-off, word graph density, etc.) using some trained classifiers. Explicit model based approaches employ a candidate model together with some competitor models =-=[2,3]-=- (e.g., anti-model or filler model, etc.) and a likelihood ratio test is usually applied. Finally, posterior probability based approach tries to estimate the posterior probabilities of a recognized en... |

2 | Robust Verification of Recognized Words in Noise
- Lo, Soong, et al.
- 2004
(Show Context)
Citation Context ...rrors is investigated in details. In a separated paper, also presented in this proceedings, we investigate the word verification in noisy environments using the generalized word posterior probability =-=[6]-=-. 2. Generalized Word Posterior Probability In maximum a posteriori (MAP) based speech recognition, M the best recognized word string w* 1 is obtained by maximizing the corresponding string posterior ... |

1 |
Confidence measures for large vocabulary continuous speech recognition
- Schlutter, Macherey, et al.
(Show Context)
Citation Context ...ally applied. Finally, posterior probability based approach tries to estimate the posterior probabilities of a recognized entity (e.g., subword, word, or sentence) given all the acoustic observations =-=[4,5]-=-. In this study we generalize the concept of word posterior probability (WPP) to take into account the practical limitations in computing the WPP. Specifically, how to optimize the exponential weights... |