Results 1 -
4 of
4
Beyond ASR 1-best: Using word confusion networks in spoken language understanding
, 2006
"... We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustnes ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10 % absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5 % and 10 % relative reduction in error rate using WCNs compared to ASR 1-best output.
Weighted Automata Kernels -- General Framework and Algorithms
- IN PROCEEDINGS OF THE 9TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY (EUROSPEECH ’03), SPECIAL SESSION ADVANCED MACHINE LEARNING ALGORITHMS FOR SPEECH AND LANGUAGE PROCESSING
, 2003
"... Kernel methods have found in recent years wide use in statistical learning techniques due to their good performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Kernel methods have found in recent years wide use in statistical learning techniques due to their good performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels handle. We recently introduced a general kernel framework based on weighted transducers, rational kernels, to extend kernel methods to the analysis of variable-length sequences and weighted automata [5] and described their application to spoken-dialog applications. We presented a constructive algorithm for ensuring that rational kernels are positive definite symmetric, a property which guarantees the convergence of discriminant classification algorithms such as Support Vector Machines, and showed that many string kernels previously introduced in the computational biology literature are special instances of such positive definite symmetric rational kernels [4]. This paper reviews the essential results given in [5, 3, 4] and presents them in the form of a short tutorial.
Code Breaking for Automatic Speech Recognition
"... Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutti ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutting algorithm, we first analyze lattices generated by a state-of-the-art speech recognizer to spot possible errors in its first-pass hypothesis. We then train specialized decoders for each of these problems and apply them to refine the first-pass hypothesis. We study the use of Support Vector Machines (SVMs) as discriminative models over each of these problems. The estimation of a posterior distribution over hypoth-esis in these regions of acoustic confusion is posed as a logistic regression problem. GiniSVMs, a variant of SVMs, can be used as an approximation technique to estimate the parameters of the logistic regression problem. We first validate our approach on a small vocabulary recognition task, namely, alphadigits. We show that the use of GiniSVMs can substantially improve the per-formance of a well trained MMI-HMM system. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We will then analyze lattice cutting in terms of its ability to reliably identify, and provide good alternatives for incorrectly hypothesized words in the Czech MALACH domain, a large vocabulary task. We describe a procedure to train and apply SVMs to strengthen the first pass system, resulting in small but statistically significant recog-nition improvements. We conclude with a discussion of methods including clustering for obtaining further improvements on large vocabulary tasks.
Mail Stop T-27A
"... (Received XXX; revised XXX) Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station, is as far as we know the first spoken dialog system in space. We describe the objectives of the Clarissa project and the system’s architecture. In ..."
Abstract
- Add to MetaCart
(Received XXX; revised XXX) Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station, is as far as we know the first spoken dialog system in space. We describe the objectives of the Clarissa project and the system’s architecture. In particular, we focus on three key problems: grammar-based speech recognition using the Regulus toolkit; methods for open mic speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations. We first describe the grammar-based recogniser we have build using Regulus, and report experiments where we compare it against a class N-gram recogniser trained off the same 3297 utterance dataset. We obtained a 15 % relative improvement in WER, and a 37% improvement in semantic error rate. The grammar-based recogniser moreover outperforms the class N-gram version for utterances of all lengths from 1 to 9 words inclusive. The central problem in building an open-mic speech recognition system is being able to distinguish between commands directed at the system, and other material (cross-talk),

