Results 1 -
6 of
6
Connectionist speech recognition of Broadcast News
, 2002
"... This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to post ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimation, pronunciation modelling and search. In addition we have investigated a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources. We have incorporated all of these techniques into a system for the transcription
Confidence Measures From Local Posterior Probability Estimates
- Computer Speech and Language
, 1999
"... In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attra ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attractive as they may be applied at the state-, phone-, word- or utterance-levels, potentially enabling discrimination between different causes of low confidence recognizer output, such as unclear acoustics or mismatched pronunciation models. We have evaluated these confidence measures for utterance verification using a number of different metrics. Experiments reveal several trends in `profitability of rejection', as measured by the unconditional error rate of a hypothesis test. These trends suggest that crude pronunciation models can mask the relatively subtle reductions in confidence caused by out-of-vocabulary (OOV) words and disfluencies, but not the gross model mismatches elicited by non-sp...
Developing and enhancing posterior based speech recognition systems
- IDIAP RR
, 2005
"... Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “Tandem”) to improve speech recognition systems. In this paper, we present initial results towards boosting these approaches by improving post ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “Tandem”) to improve speech recognition systems. In this paper, we present initial results towards boosting these approaches by improving posterior estimates, using acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). In the present work, the enhanced posterior distribution is associated with the “gamma ” distribution typically used in standard HMMs training, and estimated from local likelihoods (GMM) or local posteriors (ANN). This approach results in a family of new HMM based systems, where only posterior probabilities are used, while also providing a new, principled, approach towards a hierarchical use/integration of these posteriors, from the frame level up to the phone and word levels, and integrating the appropriate context and prior knowledge in each level. In the present work, we used the resulting posteriors as local scores in a Viterbi decoder. On the OGI Numbers’95 database, this resulted in improved recognition performance, compared to a state-of-the-art hybrid HMM/ANN system. 1.
Hierarchical multi-stream posterior based speech secognition system
- In Proceedings MLMI workshop
, 2005
"... Abstract. In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on “state gamma posterior ” definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs.This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/GMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the stateof-the-art Tandem systems. 1
USING MORE INFORMATIVE POSTERIOR PROBABILITIES FOR SPEECH RECOGNITION
"... In this paper, we present initial investigations towards boosting posterior probability based speech recognition systems by estimating more informative posteriors taking into account acoustic context (e.g., the whole utterance), as well as possible prior information (such as phonetic and lexical kno ..."
Abstract
- Add to MetaCart
In this paper, we present initial investigations towards boosting posterior probability based speech recognition systems by estimating more informative posteriors taking into account acoustic context (e.g., the whole utterance), as well as possible prior information (such as phonetic and lexical knowledge). These posteriors are estimated based on HMM state posterior probability definition (typically used in standard HMMs training). This approach provides a new, principled, theoretical framework for hierarchical estimation/use of more informative posteriors integrating appropriate context and prior knowledge. In the present work, we used the resulting posteriors as local scores for decoding. On the OGI numbers database, this resulted in significant performance improvement, compared to using MLP estimated posteriors for decoding (hybrid HMM/ANN approach) for clean and more specially for noisy speech. The system is also shown to be much less sensitive to tuning factors (such as phone deletion penalty, language model scaling) compared to the standard HMM/ANN and HMM/GMM systems, thus practically it does not need to be tuned to achieve the best possible performance. 1.
IMPROVED PHONE POSTERIOR ESTIMATION THROUGH k-NN AND MLP-BASED SIMILARITY
, 2008
"... thesis supervisor, who allowed me to do it at the Idiap Research Institute and who gave me a very interesting work subject: Improved Phone Posterior Estimation Through k-NN And MLPbased Similarity. I thank him, and also Dr Mathew Magimai Doss and Mrs Afsaneh Asaei for their kindness, their guidance, ..."
Abstract
- Add to MetaCart
thesis supervisor, who allowed me to do it at the Idiap Research Institute and who gave me a very interesting work subject: Improved Phone Posterior Estimation Through k-NN And MLPbased Similarity. I thank him, and also Dr Mathew Magimai Doss and Mrs Afsaneh Asaei for their kindness, their guidance, their availability and their support all along this thesis. I am also thankful to Professor Thierry Dutoit, my supervisor at Faculté Polytechnique de Mons in Belgium, for giving me the opportunity to realize my master’s thesis within the international scientific and cultural context of the Idiap Research Institute. I would like to thank also particularly Mrs Nadine Rousseau and Mrs Sylvie Millius for their kindness, their advices and their helps in all the administrative procedures and in the research of an apartment in Martigny. Finally, I would like to thank all the people working in Idiap Research Institute for their warm welcome, when I came there for the first time and throughout my master’s thesis, and

