Results 1 -
5 of
5
Recognition confidence scoring and its use in speech understanding systems
- Computer Speech and Language
, 2002
"... In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, a ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, and utterance levels. The scores are produced by extracting confidence features from the computation of the recognition hypotheses and processing these features using an accept/reject classifier for word and utterance hypotheses. The output of the confidence classifiers can then be incorporated into the parsing mechanism of the language understanding component. To evaluate the system, experiments were conducted using the JUPITER weather information system. Evaluation was performed at the understanding level using key-value pair concept error rate as the evaluation metric. When confidence scores were integrated into the understanding component of the system, the concept error rate was reduced by over 35%.
Using Natural Language Processing and Discourse Features to Identify Understanding Errors in a Spoken Dialogue System
- In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... While it has recently become possible to build spoken dialogue systems that interact with users in real-time in a range of domains, systems that support conversational natural language are still subject to a large number of spoken language understanding (SLU) errors. Endowing such systems with ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
While it has recently become possible to build spoken dialogue systems that interact with users in real-time in a range of domains, systems that support conversational natural language are still subject to a large number of spoken language understanding (SLU) errors. Endowing such systems with the ability to reliably distinguish SLU errors from correctly understood utterances might allow them to correct some errors automatically or to interact with users to repair them, thereby improving the system's overall performance. We report experiments on learning to automatically distinguish SLU errors in 11,787 spoken utterances collected in a field trial of AT&T's How May I Help You system interacting with live customer traffic. We apply the automatic classifier RIPPER (Cohen 96) to train an SLU classifier using features that are automatically obtainable in real-time. The classifer achieves 86% accuracy on this task, an improvement of 23% over the majority class baseline....
Spoken Language Understanding Within Dialogs Using A Graphical Model Of Task Structure
- Proc. ICSLP 98
, 1998
"... We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verific ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verification scores, or of a word lattice with arc weights. A confidence model is used throughout and all inferences are probability-weighted. The interpretation consists of a probability for each class and for each auxiliary information label needed for task completion. Anaphoric references are permitted. 1. INTRODUCTION We are interested in spoken dialog systems in which the caller responds using fluent natural language to the prompt "How may I help you?" (HMIHY). In previous work we have described the speech recognizer [1], automatic acquisition of salient phrase and grammar fragments, and call-type classification [2,3], the dialog manager [4], and the incorporation of utterance verification [...
Beyond ASR 1-best: Using word confusion networks in spoken language understanding
, 2006
"... We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustnes ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10 % absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5 % and 10 % relative reduction in error rate using WCNs compared to ASR 1-best output.
An Integrative and Discriminative Technique for Spoken Utterance Classification
"... Abstract—Traditional methods of spoken utterance classification (SUC) adopt two independently trained phases. In the first phase, an automatic speech recognition (ASR) module returns the most likely sentence for the observed acoustic signal. In the second phase, a semantic classifier transforms the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Traditional methods of spoken utterance classification (SUC) adopt two independently trained phases. In the first phase, an automatic speech recognition (ASR) module returns the most likely sentence for the observed acoustic signal. In the second phase, a semantic classifier transforms the resulting sentence into the most likely semantic class. Since the two phases are isolated from each other, such traditional SUC systems are suboptimal. In this paper, we present a novel integrative and discriminative learning technique for SUC to alleviate this problem, and thereby, reduce the semantic classification error rate (CER). Our approach revolves around the effective use of the-best lists generated by the ASR module to reduce semantic classification errors. The-best list sentences are first rescored using all the available knowledge sources. Then, the sentence that is most likely to helps reduce the CER are extracted from the-best lists as well as those sentences that are most likely to increase the CER. These sentences are used to discriminatively train the language and semantic-classifier models to minimize the overall semantic CER. Our experiments resulted in a reduction of CER from its initial value of 4.92 % to 4.04 % in the standard ATIS task. Index Terms—Automatic speech recognition (ASR), discriminative training, spoken language understanding (SLU), spoken utterance classification (SUC), statistical language modeling. I.

