Results 1 - 10
of
18
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Subword-based Approaches for Spoken Document Retrieval
, 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four
Jupiter: A Telephone-Based Conversational Interface for Weather Information
- IEEE Trans. on Speech and Audio Processing
, 2000
"... In early 1997, our group initiated a project to develop jupiter, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human langua ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
In early 1997, our group initiated a project to develop jupiter, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephonebased speech recognition, robust language understanding, language generation, dialogue modelling, and multilingual interfaces. Over a two year period since coming on line in May 1997, jupiter has received, via a toll-free number in North America, over 30,000 calls (totalling over 180,000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system related issues such as utterance rejection and content harvesting. We will also present some evaluation results on the system and its components.
Challenges For Spoken Dialogue Systems
- In Proceedings of 1999 IEEE ASRU Workshop
, 1999
"... The past decade has seen the development of a large number of spoken dialogue systems around the world, both as research prototypes and commercial applications. These systems allow users to interact with a machine to retrieve information, conduct transactions, or perform other problem-solving tasks. ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
The past decade has seen the development of a large number of spoken dialogue systems around the world, both as research prototypes and commercial applications. These systems allow users to interact with a machine to retrieve information, conduct transactions, or perform other problem-solving tasks. In this paper we discuss some of the design issues which confront developers of spoken dialogue systems, provide some examples of research being undertaken in this area, and describe some of the ongoing challenges facing current spoken language technology.
Data Collection And Performance Evaluation Of Spoken Dialogue Systems: The Mit Experience
- IN: PROC. 6 TH INT
, 2000
"... In this paper we report our efforts in data collection and performance evaluation in support of spoken dialogue system development. We describe two understanding metrics called query density and concept efficiency which can be interpreted on a perutterance basis, but which are measured over the cour ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In this paper we report our efforts in data collection and performance evaluation in support of spoken dialogue system development. We describe two understanding metrics called query density and concept efficiency which can be interpreted on a perutterance basis, but which are measured over the course of a dialogue. We also describe the evaluation infrastructure we have developed to support off-line data processing using our GALAXY client-server architecture [8]. We show how we have used these metrics and mechanisms as part of the development of a spoken dialogue system for air-travel information.
The MIT finite-state transducer toolkit for speech and language processing
- in Proc. ICSLP
, 2004
"... We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its flexibility, yet remain efficient enough to aid real-world computationally demanding applications such as automatic speech recognition. The toolkit supports the construction, combination, optimization, and training of weighted FSTs and FSAs, and as such is useful in many areas of human language technology. 1.
Recognizing Non-Native Speech: Characterizing and Adapting to Non-Native Usage in LVCSR
, 2001
"... Low-proficiency non-native speakers represent a significant challenge for large-vocabulary continuous speech recognition (LVCSR). Acoustic models are confused by a heavy accent ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Low-proficiency non-native speakers represent a significant challenge for large-vocabulary continuous speech recognition (LVCSR). Acoustic models are confused by a heavy accent
Heterogeneous Lexical Units For Automatic Speech Recognition: Preliminary Investigations
, 2000
"... This paper explores the use of the phone and syllable as primary units of representation in the rst stage of a two-stage recognizer. A nite-state transducer speech recognizer is utilized to congure the recognition as a twostage process, where either phone or syllable graphs are computed in the rst s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper explores the use of the phone and syllable as primary units of representation in the rst stage of a two-stage recognizer. A nite-state transducer speech recognizer is utilized to congure the recognition as a twostage process, where either phone or syllable graphs are computed in the rst stage, and passed to the second stage to determine the most likely word hypotheses. Preliminary experiments in a weather information speech understanding domain show that a syllable representation with either bigram or trigram language models provides more constraint than a phonetic representation with a higher-order n-gram language model (up to a 6-gram), and approaches the performance of a more conventional single-stage word-based conguration. 1. INTRODUCTION Most conventional speech recognition systems represent the search space as a directed graph of phone-like units. These graphs are typically determined by the allowable pronunciations of a given word vocabulary, with word (and th...
On Organic Interfaces
"... For over four decades, our research community has taken remarkable strides in advancing human language technologies. This has resulted in the emergence of spoken dialogue interfaces that can communicate with humans on their own terms. For the most part, however, we have assumed that these interfaces ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
For over four decades, our research community has taken remarkable strides in advancing human language technologies. This has resulted in the emergence of spoken dialogue interfaces that can communicate with humans on their own terms. For the most part, however, we have assumed that these interfaces are static; it knows what it knows and doesn’t know what it doesn’t. In my opinion, we are not likely to succeed until we can build interfaces that behave more like organisms that can learn, grow, reconfigure, and repair themselves, much like humans. In this paper, I will argue my case and outline some new research challenges. Index Terms: speech-based interfaces, dialogue systems 1.
The Use Of Dynamic Reliability Scoring In Speech Recognition
- IN PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP2000
, 2000
"... Typically, along a recognizer's search path, some acoustic units are modeled more reliably than others, due to differences in their acoustic-phonetic features and many other factors. This paper presents a dynamic reliability scoring scheme which can help adjust the partial path scores while the reco ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Typically, along a recognizer's search path, some acoustic units are modeled more reliably than others, due to differences in their acoustic-phonetic features and many other factors. This paper presents a dynamic reliability scoring scheme which can help adjust the partial path scores while the recognizer searches through the composed lexical and acoustic-phonetic network. The reliability models are trained on the acoustic scores of the correct arc and its immediate competing arcs extending the current partial path. During recognition, if, according to the trained reliability models, an arc can be more easily distinguished from the competing alternatives, that arc is more likely to be in the right path, and the partial path score can be adjusted accordingly on the fly to have a more accurate path hypothesis. We have applied this reliability scoring mechanism in two weather related domains, JUPITER [6] (for English) and PANDA (a predecessor of MUXING [5] for Mandarin Chinese). We get 9....

