Results 1 - 10
of
14
Confidence Measures for Large Vocabulary Continuous Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 2001
"... In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word gra ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word graphs using a forward-backward algorithm. We also study the estimation of posterior probabilities on N-best lists instead of word graphs and compare both algorithms in detail. In addition, we compare the posterior probabilities with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density. We present experimental results on five different corpora: the Dutch ARISE lk evaluation corpus, the German Verbmobil '98 7k evaluation corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 65k evaluation corpus. We show that the posterior probabilities computed on word graphs outperform all other confidence measures. The relative reduction in confidence error rate ranges between 19% and 35% compared to the baseline confidence error rate.
ESTIMATING CONFIDENCE USING WORD LATTICES
"... For many practical applications of speech recognition systems, it is desirable to have an estimate of con dence for each hypothesized word, i.e. to have an estimate which words of the speech recognizer's output are likely to be correct and which are not reliable. Many oftoday's speech recognition sy ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
For many practical applications of speech recognition systems, it is desirable to have an estimate of con dence for each hypothesized word, i.e. to have an estimate which words of the speech recognizer's output are likely to be correct and which are not reliable. Many oftoday's speech recognition systems use word lattices as a compact representation of a set of alternative hypothesis. We exploit the use of such word lattices as information sources for the measure-of-con dence tagger JANKA [1]. In experiments on spontaneous human-to-human speech data the use of word lattice related information signi cantly improves the tagging accuracy.
Neural-Network Based Measures Of Confidence For Word Recognition
- in Proc. ICASSP
, 1997
"... This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and confidence systems. The definitions and algorithms are illustrated with results on the Switchboard Corpus. 1. INTRODUCTION In the last few years, a lot of research has been devoted to the development of confidence scores associated with the outputs of automatic speech recognition (ASR) systems. These scores were used mostly to help spot keywords in spontaneous or read texts, and to provide a basis for the rejection of out-of-vocabulary words (e.g. [4-11]). Many other ASR applications could also benefit from knowing the level of confidence in correct recognition. For example, text-dependent speaker recognition systems could put more emphasis on words recognized with h...
Confidence Measures For Spontaneous Speech Recognition
- in Proc. ICASSP
, 1997
"... For many practical applications of speech recognition systems, it is desirable to have an estimate of confidence for each hypothesized word, i.e. to have an estimate of which words of the output of the speech recognizer are likely to be correct and which are not reliable. We describe the development ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
For many practical applications of speech recognition systems, it is desirable to have an estimate of confidence for each hypothesized word, i.e. to have an estimate of which words of the output of the speech recognizer are likely to be correct and which are not reliable. We describe the development of the measure of confidence tagger JANKA, which is able to provide confidence information for the words in the output of the speech recognizer JANUS-3-SR. On a spontaneous german human-to-human database, JANKA achieves a tagging accuracy of 90% at a baseline word accuracy of 82%. 1. INTRODUCTION Current speech recognition systems are far from perfect. Unfortunately, number and location of the errors in their output is usually unknown. This information, however, could be used in a number of applications. Examples for such applications are word selection for unsupervised adaptation schemes like MLLR [1], automatic weighting of additional, non-speech knowledge sources like lip-reading, or ai...
A Phone-Dependent Confidence Measure For Utterance Rejection
- In Proceedings of the International Conference on Acoustics, Speech and Signal Processing
"... An acoustic confidence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
An acoustic confidence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so that a single global threshold can be applied to hypothesis rejection for any word sequence. Phone confidence is computed for each frame of speech as the posterior phone probability given the acoustic observation. Word sequence confidence is evaluated as the average phone confidence, either by weighting all frames equally or by normalizing by phone duration. The confidence measure is tested on a database of spoken company names. When normalized by phone duration, it achieves, in some cases with less computational expense, rejection performance comparable to a baseline system implementing a common filler-model approach. When all frames are equally weighted, performance is substantially poorer...
Confidence Measures For HMM-Based Speech Recognition
- in ICSLP’98
"... In this paper, we describe our work on the field of confidence measures for HMM-based speech recognition. Confidence measures are a means of estimating the recognition reliability for single words of the recognizer output. The possible applications of such measures are manifold. We present our exper ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In this paper, we describe our work on the field of confidence measures for HMM-based speech recognition. Confidence measures are a means of estimating the recognition reliability for single words of the recognizer output. The possible applications of such measures are manifold. We present our experiments with well known approachesand proposesome new ones. Particularly, we propose to combine the mere acoustical measures with language model-based ones for continuous speech recognition that involves a stochastic language model. This slightly improves the acoustical measures and preserves their advantage of being computationally very cheap. Experiments are carried out on a German isolated word recognition system and on continuous speech recognition systems for the Resource Management database and the Wall Street Journal WSJ0 task. 1. INTRODUCTION Word-based confidencemeasures for speechrecognition basedon hidden Markov models (HMMs) have for some years now been an important research top...
Detection and Transcription of OOV Words
, 1998
"... This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two different strategies are proposed. The first strategy consists in preventing the problem, i.e. the occurrence of OOV words, and this thesis presents two ways of doing that. First, the system vocabulary is optimized using information extracted from other corpora and application domains, such that the number of expected OOV words be minimized. Using this method, the vocabulary coverage was significantly improved, especially for small vocabularies. The second method of reducing the number of OOV words consists of redefining the concept of "word" based on morphological considerations. In particular, compound words are decomposed into their constituent parts, which are used as the lexical recogni...
Confidence and Rejection in Automatic Speech Recognition
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Research Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Male/Female Versus Last Na ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Research Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Male/Female Versus Last Names : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Scaling Up: 58 Phrases : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.4 Vocabulary Independence : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1.5 Thesis Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.6 Tutorial on Automatic Speech Recognition : : : : : : : : : : : : : : : : : : : 7 1.6.1 A Setting for Automatic Speech Recognition : : : : : : : : : : : : : 7 1.6.2 Overview of Speech Recognition : : : : : : : : : : : : : : : : : : : : 8 1.6.3 Artificial Neural Network : : : : : : : : : : : : : : : : : : : : : : : : 12 1.6.4 Context-Dependent Modeling : : : : : : : : : : : : : ...
Generation and Combination of Complementary Systems for Automatic Speech Recognition
, 2008
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [15, 16, 17 ..."
Abstract
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [15, 16, 17]. The length of this thesis including appendices, references, footnotes, tables and equations is approximately 56,000 words and contains 42 figures and 40 tables. i Summary It has been found that using a combination of systems for large vocabulary continuous speech recognition (LVCSR) can outperform the use of a single system. For the combination to yield gains, the individual models must be complementary, i.e. they must make different errors. Previous work in ASR has mainly relied on an ad-hoc approach to finding complementary systems. Multiple systems are built, and those that perform well in combination are selected. The multiple diverse systems can be built in many ways, including the use of different frontends, injecting randomness, altering the model topology or using different training

