Results 1 - 10
of
20
Neural-Network Based Measures Of Confidence For Word Recognition
- in Proc. ICASSP
, 1997
"... This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and confidence systems. The definitions and algorithms are illustrated with results on the Switchboard Corpus. 1. INTRODUCTION In the last few years, a lot of research has been devoted to the development of confidence scores associated with the outputs of automatic speech recognition (ASR) systems. These scores were used mostly to help spot keywords in spontaneous or read texts, and to provide a basis for the rejection of out-of-vocabulary words (e.g. [4-11]). Many other ASR applications could also benefit from knowing the level of confidence in correct recognition. For example, text-dependent speaker recognition systems could put more emphasis on words recognized with h...
Large Vocabulary Decoding And Confidence Estimation Using Word Posterior Probabilities
- IN PROC. ICASSP 2000
, 2000
"... This paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi deco ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi decoder is presented. The problem of the robust estimation of confidence scores from word posteriors is examined and a method based on decision trees is suggested. The effectiveness of these techniques is demonstrated on the broadcast news and the conversational telephone speech corpora where improvements both in terms of word error rate and normalised cross entropy were achieved compared to the baseline HTK evaluation systems.
Posterior Probability Decoding, Confidence Estimation And System Combination
, 2000
"... In this paper the estimation of word posterior probabilities is discussed and their application in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech evaluation is described. The word lattices produced by the Viterbi decoder were used to generate confusion networks, which ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
In this paper the estimation of word posterior probabilities is discussed and their application in the CU-HTK system used in the March 2000 Hub5 Conversational Telephone Speech evaluation is described. The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities. These confusion networks were used in a number of post-processing steps. The 1-best sentence hypotheses extracted directly from the networks are shown to be significantly more accurate than the baseline decoding results. The posterior probability estimates were used as the basis for the estimation of word-level confidence scores. A new system combination technique is presented that uses these confidence scores and the confusion networks and performs better than the well-known ROVER technique.
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities
- in Proc. ICASSP
, 2001
"... In this paper, we introduce a new concept, the time frame error rate. We show that this error rate is closely correlated with the word error rate and use it to overcome the mismatch between Bayes' decision rule which aims at minimizing the expected sentence error rate and the word error rate which i ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
In this paper, we introduce a new concept, the time frame error rate. We show that this error rate is closely correlated with the word error rate and use it to overcome the mismatch between Bayes' decision rule which aims at minimizing the expected sentence error rate and the word error rate which is used to assess the performance of speech recognition systems. Based on the time frame errors we derive a new decision rule and show that the word error rate can be reduced consistently with it on various recognition tasks. All stochastic models are left completely unchanged. We present experimental results on five corpora, the Dutch Arise corpus, the German Verbmobil '98 corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 corpus. The relative reduction of the word error rate ranges from 2.3% to 5.1%.
A Phone-Dependent Confidence Measure For Utterance Rejection
- In Proceedings of the International Conference on Acoustics, Speech and Signal Processing
"... An acoustic confidence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
An acoustic confidence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so that a single global threshold can be applied to hypothesis rejection for any word sequence. Phone confidence is computed for each frame of speech as the posterior phone probability given the acoustic observation. Word sequence confidence is evaluated as the average phone confidence, either by weighting all frames equally or by normalizing by phone duration. The confidence measure is tested on a database of spoken company names. When normalized by phone duration, it achieves, in some cases with less computational expense, rejection performance comparable to a baseline system implementing a common filler-model approach. When all frames are equally weighted, performance is substantially poorer...
Discriminative keyword spotting
- In Proc. of Workshop on Non-Linear Speech Processsing
, 2007
"... This paper proposes a new approach for keyword spotting, which is not based on HMMs. The proposed method employs a new discriminative learning procedure, in which the learning phase aims at maximizing the area under the ROC curve, ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
This paper proposes a new approach for keyword spotting, which is not based on HMMs. The proposed method employs a new discriminative learning procedure, in which the learning phase aims at maximizing the area under the ROC curve,
On-line Garbage Modeling with Discriminant Analysis for Utterance Verification
- In Proceedings of the International Conference on Spoken Language Processing
, 1996
"... Out-of-vocabulary (OOV) utterance detection and rejection are specially important and difficult problems in large-vocabulary and continuous speech recognition. In [1] we proposed an utterance verification procedure based on the use of frame-by-frame best acoustic state scores instead of using explic ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Out-of-vocabulary (OOV) utterance detection and rejection are specially important and difficult problems in large-vocabulary and continuous speech recognition. In [1] we proposed an utterance verification procedure based on the use of frame-by-frame best acoustic state scores instead of using explicit garbage models. This procedure is usually referred to as on-line garbage In this contribution we extend our previous work in two major directions: a) we analyze, through the use of Discriminant Analysis, the possibilities of using L-best local scores and N-best utterance hypotheses scores for utterance verification; b) we present experimental results not only for a spontaneously spoken natural number recognition task, as in [1], but also for a flexible large vocabulary recognition task. All the results, based on a telephone database, show that the proposed on-line garbage modeling procedure outperforms, both in performance and computational cost, to other approaches based on the use of explicit garbage models.
Unsupervised Spoken Keyword Spotting via Segmental DTW on Gaussian Posteriorgrams
"... Abstract—In this paper, we present an unsupervised learning framework to address the problem of detecting spoken ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract—In this paper, we present an unsupervised learning framework to address the problem of detecting spoken
Using Information on Lexical Stress for Utterance Verification
, 2001
"... ASR applications like nationwide telephone directory assistance (DA) face the challenge of making a correct classification with only minimal amounts of acoustic data. For this reason, current systems still make too many errors in order to be useful. In the perspective of the idea that `no recognitio ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
ASR applications like nationwide telephone directory assistance (DA) face the challenge of making a correct classification with only minimal amounts of acoustic data. For this reason, current systems still make too many errors in order to be useful. In the perspective of the idea that `no recognition' is better than `misrecognition', a feasible system should therefore detect and reject the least reliable hypotheses. This process is known as utterance verification.

