Results 1 - 10
of
12
A Form-Based Dialogue Manager For Spoken Language Applications
- In Proc. ICSLP
, 1996
"... A popular approach to dialogue management is based on a finitestate model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. This paper describes an alternative dialogue planning algorithm based on the notion of filling ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
A popular approach to dialogue management is based on a finitestate model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. This paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E- form." Each slot has associated prompts that guide the user through the dialogue, and a priority that determines the order in which the system tries to acquire information. These slots can be optional or mandatory. However, the user is not restricted to follow the system's lead, and is free to ignore the prompts and take the initiative in the dialogue. The E-form-based dialogue planner has been used in an application to search a database of used car advertisements. The goal is to assist the user in selecting, from this database, a small list of cars which match their constraints. For a large number of dialogues collected from over 600 naive users, we found over 70%compliance in answering specific system prompts.
Beyond Structured Dialogues: Factoring Out Grounding
- In Proceedings of the International Conference on Spoken Language Processing (ICSLP-98
, 1998
"... Structured dialogue models are currently the only tools for easily building spoken dialogue systems. This approach, however, requires the dialogue designer to completely specify all dialogue behavior between the user and system, including how information is grounded between the user and the system. ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Structured dialogue models are currently the only tools for easily building spoken dialogue systems. This approach, however, requires the dialogue designer to completely specify all dialogue behavior between the user and system, including how information is grounded between the user and the system. In this paper, we advocate factoring out the grounding behavior from structured dialogue models by using a general purpose dialogue manager that accounts for this behavior. This not only simplifies the specification of dialogue, but also allows more powerful mechanisms of grounding to be employed, which cannot be implemented within the framework of structured dialogues. 1. INTRODUCTION As speech recognition and speech synthesis continue to improve, spoken dialogue systems have started to emerge. However, significant barriers remain in building effective spoken dialogue systems. There will always be errors in speech recognition, and unfortunately, natural language understanding and dialogue...
Multimodal Discourse Modelling In A Multi-User Multi-Domain Environment
- PROC. ICSLP
"... This paper describes the discourse component of GALAXY, a multidomain, multimodal conversational system. In designing this module, we are attempting to developdomain-independentmechanisms, controlled via declarative tables, to promote convenient instantiation of a discourse component for each new do ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper describes the discourse component of GALAXY, a multidomain, multimodal conversational system. In designing this module, we are attempting to developdomain-independentmechanisms, controlled via declarative tables, to promote convenient instantiation of a discourse component for each new domain. Direct anaphoric reference as well as elliptical reference are dealt with appropriately. Users can also refer verbally to items selected via mouse clicks. Cross domain references are particularly challenging, as is the ambiguity problem arising from different case roles for different subdomains. Users often utter fragments, sometimes in response to serverinitiated dialogue exchanges, so an extensive fragment interpretation mechanism is supported.
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
The Use Of Linguistic Hierarchies In Speech Understanding
- IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottom-up from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and letter-to-sound/sound-to-letter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
Virtual Modality: a Framework for Testing and Building Multimodal Applications
- HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems
, 2004
"... This paper introduces a method that generates simulated multimodal input to be used in testing multimodal system implementations, as well as to build statistically motivated multimodal integration modules. The generation of such data is inspired by the fact that true multimodal data, recorded ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper introduces a method that generates simulated multimodal input to be used in testing multimodal system implementations, as well as to build statistically motivated multimodal integration modules. The generation of such data is inspired by the fact that true multimodal data, recorded from real usage scenarios, is difficult and costly to obtain in large amounts. On the other hand, thanks to operational speech-only dialogue system applications, a wide selection of speech/text data (in the form of transcriptions, recognizer outputs, parse results, etc.) is available. Taking the textual transcriptions and converting them into multimodal inputs in order to assist multimodal system development is the underlying idea of the paper. A conceptual framework is established which utilizes two input channels: the original speech channel and an additional channel called Virtual Modality. This additional channel provides a certain level of abstraction to represent non-speech user inputs (e.g., gestures or sketches). From the transcriptions of the speech modality, pre-defined semantic items (e.g., nominal location references) are identified, removed, and replaced with deictic references (e.g., here, there). The deleted semantic items are then placed into the Virtual Modality channel and, according to external parameters (such as a pre-defined user population with various deviations), temporal shifts relative to the instant of each corresponding deictic reference are issued. The paper explains the procedure followed to create Virtual Modality data, the details of the speech-only database, and results based on a multimodal city information and navigation application.
Testing Dialogue Systems By Means of Automatic Generation of Conversations
, 2002
"... This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used to check the performance of the system. The technique is based on the use of a module called user simulator whose purpose is to behave as real users when they interact with dialogue systems. The behaviour of the simulator is decided by means of diverse scenarios that represent the goals of the users. The simulator aim is to achieve the goals set in the scenarios during the interaction with the dialogue system. We have applied the technique to test a dialogue system developed in our lab. The test has been carried out considering different levels of white and babble noise as well as a VTS noise compensation technique. The results prove that the dialogue system performance is worse under the babble noise conditions. The VTS technique has been effective when dealing with noisy utterances and has lead to better experimental results, particularly for the white noise. The technique has permitted to detect problems in the dialogue strategies employed to handle confirmation turns and recognition errors, suggesting that these strategies must be improved. q 2002 Elsevier Science B.V. All rights reserved.
Porting the Galaxy System to Mandarin Chinese
, 1997
"... Galaxy is a human-computer conversational system that provides a spoken language interface for accessing on-line information. It was initially implemented for English in travel-related domains, including air travel, local city navigation, and weather. Efforts were started to develop multilingual sys ..."
Abstract
- Add to MetaCart
Galaxy is a human-computer conversational system that provides a spoken language interface for accessing on-line information. It was initially implemented for English in travel-related domains, including air travel, local city navigation, and weather. Efforts were started to develop multilingual systems within the framework of galaxy several years ago. This thesis focuses on developing the Mandarin Chinese version of the galaxy system, including speech recognition, language understanding and language generation components. Large amounts of Mandarin speech data have been collected from native speakers to derive linguistic rules, acoustic models, language models and vocabularies for Chinese. Comparisons between the Chinese and English languages have been made in the context of system implementation. Some issues that are specific for Chinese have been addressed, to make the system core more language independent. Overall, the system produced reasonable responses nearly 70% of the time for...
Testing Dialogue Systems By Means of Automatic
, 2002
"... This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used ..."
Abstract
- Add to MetaCart
This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used to check the performance of the system. The technique is based on the use of a module called user simulator whose purpose is to behave as real users when they interact with dialogue systems. The behaviour of the simulator is decided by means of diverse scenarios that represent the goals of the users. The simulator aim is to achieve the goals set in the scenarios during the interaction with the dialogue system. We have applied the technique to test a dialogue system developed in our lab. The test has been carried out considering different levels of white and babble noise as well as a VTS noise compensation technique. The results prove that the dialogue system performance is worse under the babble noise conditions. The VTS technique has been effective when dealing with noisy utterances and has lead to better experimental results, particularly for the white noise. The technique has permitted to detect problems in the dialogue strategies employed to handle confirmation turns and recognition errors, suggesting that these strategies must be improved. q 2002 Elsevier Science B.V. All rights reserved.

