Results 1 - 10
of
14
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
Natural Language Understanding Using Statistical Machine Translation
- In European Conf. on Speech Communication and Technology
, 2001
"... Over the past years, automatic dialogue systems and telephonebased machine inquiry systems have received increasing attention. In addition to an automatic speech recognizer and a dialogue manager, such systems consist of a natural language understanding (NLU) component. Some of the most investigated ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Over the past years, automatic dialogue systems and telephonebased machine inquiry systems have received increasing attention. In addition to an automatic speech recognizer and a dialogue manager, such systems consist of a natural language understanding (NLU) component. Some of the most investigated approaches to NLU are rule-based methods as Stochastic Grammars, which are often written manually. However, the sole usage of rule-based methods can turn out to be inflexible and the problem of reusability occurs. When extending the application scenario or changing the application's domain itself, a large part of the set of rules often must be rewritten. Therefore, techniques are desirable which help to reduce the manual effort when building up an NLU component for a new domain. In this paper we investigate an approach to NLU, which is derived from the field of statistical machine translation. Starting from a conceptual annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formallanguage target sentence. Doing this, we will mainly focus on the quality of different alignment models between source and target sentences. Even though the usage of grammars cannot be totally avoided in NLU-systems, it is our goal to reduce their employment and learn the dependencies between words and their meaning automatically. Experiments were performed on the Philips in-house TABA corpus, which is a text corpus in the domain of a German train timetable information system. 1.
Estimation of Language Models for New Spoken Language Applications
, 1996
"... Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For exam ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For example, we need training data to generate a language model that can be used by the recognizer to reduce the search space. In this paper, we will address some issues arising from small amount of training data available for a new spoken language application.
FeasPar - A Feature Structure Parser Learning to Parse Spoken Language
, 1996
"... We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming s ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming sentence into chunks, which are labeled with feature values and chunk relations. Then, the search finds the most probable and consistent feature structure. FeasPar is trained...
Comparison of Alignment Templates and Maximum Entropy Models for Natural Language Understanding
- in EACL
, 2003
"... In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language target sentence.
Implementation Testing of a Hybrid Symbolic/Statistical Multimodal Architecture
, 2002
"... The design and implementation of hybrid symbolic/statistical architectures is a major area of interest in current multimodal system development. Such an architecture attempts to improve multimodal recognition and disambiguation rates by using corpus-based statistics to weight the contributions from ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The design and implementation of hybrid symbolic/statistical architectures is a major area of interest in current multimodal system development. Such an architecture attempts to improve multimodal recognition and disambiguation rates by using corpus-based statistics to weight the contributions from various input streams. This is in contrast to current architectures that assume independence between input streams, and combine un-weighted posterior probabilities simply by taking their cross product.
Integrating Multiple Cues for Spoken Language Understanding
, 1995
"... As spoken language interfaces for real-world systems become a practical possibility, it has become apparent that such interfaces will need to draw on a variety of cues from diverse sources to achieve a robustness and naturalness approaching that of human performance [1]. However, our knowledge of ho ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
As spoken language interfaces for real-world systems become a practical possibility, it has become apparent that such interfaces will need to draw on a variety of cues from diverse sources to achieve a robustness and naturalness approaching that of human performance [1]. However, our knowledge of how these cues behave in the aggregate is still tantalizingly sketchy. We lack a strong theoretical basis for predicting which cues will prove useful in practice and for specifying how these cues should be combined to signal or cancel out potential interpretations of the communicative signal. In the research program summarized here, we propose to develop and test an initial theory of cue integration for spoken language interfaces. By establishing a principled basis for integrating knowledge sources for such interfaces, we believe that we can develop systems that perform better from a computer-human interaction standpoint. INTRODUCTION Historically, spoken language understanding research deve...
Improving On Phrase Spotting For Spoken Dialogue Processing
"... In a typical task oriented dialogue system, the interpretation task consists of mapping from the acoustic input to a series of moves. In the simplest cases each move is just a pairing of slots and values e.g. "destination = paris". In this paper we will describe a system for language interpretati ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In a typical task oriented dialogue system, the interpretation task consists of mapping from the acoustic input to a series of moves. In the simplest cases each move is just a pairing of slots and values e.g. "destination = paris". In this paper we will describe a system for language interpretation which is designed to work with lattice based output from a class based statistical language model. The work described enables phrase and keyword spotting to be incorporated within a uniform approach which also allows information from higher-level linguistic structure to be used if it is both available and likely to be helpful. The work described here is part of a larger collaborative effort being pursued as part of the EU projects D'Homme [1] and Siridus [2] where one thread of the research is hoping to provide better grounds for some of the choices that need to be made for recogniser language modelling and language processing for a variety of spoken dialogue scenarios
Robust, Finite-State Parsing for Spoken Language Understanding
"... Human understanding of spoken language appears to integrate the use of contextual expectations with acoustic level perception in a tightly-coupled, sequential fashion. Yet computer speech understanding systems typically pass the transcript produced by a speech recognizer into a natural language pars ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Human understanding of spoken language appears to integrate the use of contextual expectations with acoustic level perception in a tightly-coupled, sequential fashion. Yet computer speech understanding systems typically pass the transcript produced by a speech recognizer into a natural language parser with no integration of acoustic and grammatical constraints. One reason for this is the complex- ity of implementing that integration. To ad- dress this issue we have created a robust, semantic parser as a single finite-state machine (FSM). As such, its run-time action is less complex than other robust parsers that are based on either chart or generalized left-right (GLR) architectures. Therefore, we believe it is ultimately more amenable to direct integration with a speech decoder.
Learning to Parse Spontaneous Speech
"... We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming s ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming sentence into chunks, which are labeled with feature values and chunk relations. Then, the search nds the most probable and consistent feature structure. FeasPar is trained, tested and evaluated with the Spontaneous Scheduling Task, and compared with two samples of a handmodeled GLR * parser, developed for 4 months and 2years, respectively. The handmodeling e ort for FeasPar is2weeks. FeasPar performes better than the GLR * parser developed 4 months in all six comparisons that are made and has a similar performance as the GLR * parser developed for 2 years.

