Results 1 - 10
of
39
Dialogue act modeling for automatic tagging and recognition of conversational speech
- COMPUTATIONAL LINGUISTICS
, 2000
"... We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speec-act-like ..."
Abstract
-
Cited by 145 (13 self)
- Add to MetaCart
We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speec-act-like
A Mind Model for Multimodal Communicative Creatures & Humanoids
- INTERNATIONAL JOURNAL OF APPLIED ARTIFICIAL INTELLIGENCE
, 1999
"... This paper presents a computational model of real-time task-oriented dialogue skills. The architecture, termed Ymir, bridges between multimodal perception and multimodal action and supports the creation of autonomous computer characters that afford full-duplex, real-time face-to-face interaction wit ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
This paper presents a computational model of real-time task-oriented dialogue skills. The architecture, termed Ymir, bridges between multimodal perception and multimodal action and supports the creation of autonomous computer characters that afford full-duplex, real-time face-to-face interaction with a human. Ymir has been prototyped in software, and a humanoid created, called Gandalf, capable of fluid multimodal dialogue. Ymir demonstrates several new ideas in the creation of communicative computer agents, including perceptual integration of multimodal events, distributed planning and decision making, an explicit handling of real-time, and layered input analysis and motor control with human characteristics. This paper describes the architecture and explains its main elements. Examples ofimplementation and performance are given, and the architectures limitations and possibilities are discussed.
Conversations over video conferences: An evaluation of the spoken aspects of video-mediated communication
- Human Computer Interaction
, 1993
"... Recent trends toward telecommuting, mobile work, and wider distribution of the work force, combined with reduced technology costs, have made video communications more attractive as a means of supporting informal remote interaction. In the past, however, video communications have never gained widespr ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Recent trends toward telecommuting, mobile work, and wider distribution of the work force, combined with reduced technology costs, have made video communications more attractive as a means of supporting informal remote interaction. In the past, however, video communications have never gained widespread acceptance. Here we identify possible reasons for this by exarn-ining how the spoken characteristics of video-mediated communication differ from face-to-face interaction, for a series of real meetings. We evaluate two wide-area systems. One uses readily available Integrated Services Digital Network (ISDN) lines but suffers the limitations of transmission lags, a
Navigating joint projects with dialogue
- Cognitive Science
, 2003
"... Dialogue has its origins in joint activities, which it serves to coordinate. Joint activities, in turn, usually emerge in hierarchically nested projects and subprojects. We propose that participants use dialogue to coordinate two kinds of transitions in these joint projects: vertical transitions, or ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Dialogue has its origins in joint activities, which it serves to coordinate. Joint activities, in turn, usually emerge in hierarchically nested projects and subprojects. We propose that participants use dialogue to coordinate two kinds of transitions in these joint projects: vertical transitions, or entering and exiting joint projects; and horizontal transitions, or continuing within joint projects. The participants help signal these transitions with project markers, words such as uh-huh, m-hm, yeah, okay, or all right. These words have been studied mainly as signals of listener feedback (back-channel signals) or turn-taking devices (acknowledgment tokens). We present evidence from several types of well-defined tasks that they are also part of a system of contrasts specialized for navigating joint projects. Uh-huh, m-hm and yeah are used for horizontal transitions, and okay and all right for vertical transitions.
Can I finish? Learning when to respond to incremental interpretation results in interactive dialogue
"... We investigate novel approaches to responsive overlap behaviors in dialogue systems, opening possibilities for systems to interrupt, acknowledge or complete a user’s utterance while it is still in progress. Our specific contributions are a method for determining when a system has reached a point of ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We investigate novel approaches to responsive overlap behaviors in dialogue systems, opening possibilities for systems to interrupt, acknowledge or complete a user’s utterance while it is still in progress. Our specific contributions are a method for determining when a system has reached a point of maximal understanding of an ongoing user utterance, and a prototype implementation that shows how systems can use this ability to strategically initiate system completions of user utterances. More broadly, this framework facilitates the implementation of a range of overlap behaviors that are common in human dialogue, but have been largely absent in dialogue systems. 1
A Computational Memory And Processing Model For Prosody
- In Proceedings of the Intl. Conf. on Spoken Language Processing
, 1998
"... This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parame ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parameter in the simulations varies in turn what counts as given and new in a text, and therefore, the intonational contours with which it is uttered. Currently, the system produces prosody in three different styles: child-like, adult expressive, and knowledgeable. This prosody also exhibits differences within each style -- no two simulations are alike. The limited resource approach captures some of the stylistic and individual variety found in natural prosody. 1. INTRODUCTION Ask any lay person to imitate computer speech and you will be treated to an utterance delivered in melodic and rhythmic monotone, possibly accompanied by choppy articulation and a voice quality that is nasal and strained. ...
Perception of non-verbal emotional listener feedback
- PROC. SPEECH PROSODY 2006
, 2006
"... This paper reports on a listening test assessing the perception of short non-verbal emotional vocalisations emitted by a listener as feedback to the speaker. We clarify the concepts of backchannel and feedback, and investigate the use of affect bursts as a means of giving emotional feedback via the ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper reports on a listening test assessing the perception of short non-verbal emotional vocalisations emitted by a listener as feedback to the speaker. We clarify the concepts of backchannel and feedback, and investigate the use of affect bursts as a means of giving emotional feedback via the backchannel. Experiments with German and Dutch subjects confirm that the recognition of emotion from affect bursts in a dialogical context is similar to their perception in isolation. We also investigate the acceptability of affect bursts when used as listener feedback. Acceptability appears to be linked to display rules for emotion expression. While many ratings were similar between Dutch and German listeners, a number of clear differences was found, suggesting language-specific affect bursts.
Employing Voice Back Channels to Facilitate Audio Document Retrieval
- Proceedings of ACM Conference on Office Information Systems (COIS
, 1988
"... Human listeners use voice back channels to indicate their comprehension of a talker’s remarks. This paper describes an attempt to build a user interface capable of employing these back channel responses for flow control purposes while presenting a variety of audio information to a listener. Acoustic ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Human listeners use voice back channels to indicate their comprehension of a talker’s remarks. This paper describes an attempt to build a user interface capable of employing these back channel responses for flow control purposes while presenting a variety of audio information to a listener. Acoustic evidence based on duration and prosody (rhythm and melody) of listeners ’ utterances is employed as a means of discriminating responses by discourse function without using word recognition. Such an interface has been applied to three tasks: speech synthesis of driving directions, speech synthesis of electronic mail, and retrieval of recorded voice messages. 1 Audio document access This paper describes research in progress to develop a user interface to facilitate voice retrieval of on-line information over a telephone connection. Information may be synthesized from text such as human authored electronic mail or a response to a database query, or it may be recorded, for example a telephone message or a dictated document. We need to control the rate and order of presentation of such audio information for an efficient interaction. We desire to exploit those aspects of human dialog behavior whereby the listener gives cues to the information provider indicating comprehension and ability to keep up. We are attempting to build an intuitive and robust user interface based on the duration and prosody (rhythm and melody) of the listener’s voice responses independent of any word recognition.
Navigating joint projects in telephone conversations
- Discourse Processes
, 2004
"... Conversation coordinates joint activities and the joint projects that compose them. Participants coordinate (1) vertical transitions on entering and exiting joint projects; and (2) horizontal transitions in continuing within them. Transitions are coordinated using project markers such as uh-huh, yea ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Conversation coordinates joint activities and the joint projects that compose them. Participants coordinate (1) vertical transitions on entering and exiting joint projects; and (2) horizontal transitions in continuing within them. Transitions are coordinated using project markers such as uh-huh, yeah, right, and okay. In the authors’ proposal, participants use uh-huh, yeah, and right to continue within joint projects, and okay and all right to enter and exit them. This was examined in 2 telephone conversation corpora. Telephone conversations divide into an entry, body, and exit phase, each of which is a joint project. Okay and all right were used to transit from the entry to body and from body to exit, whereas uh-huh, yeah, and right were used within the body. JOINT PROJECTS IN CONVERSATION In conversation, the participants do not just speak—they do things together. These joint actions are normally the reason for their encounter, and their talk is shaped by the need to coordinate them. To understand what people are doing in conversation, one must understand the joint activities they are engaged in. Outside of conversation, individual and joint activities have long been analyzed into hierarchies of projects and subprojects (Cranach, Kalbermatten, Indermühle,
Voice creation for conversational fairy-tale characters
- IN PROCEEDINGS OF THE 5TH ISCA SPEECH SYNTHESIS WORKSHOP
, 2004
"... The NICE fairy-tale game system allows users to interact with conversational fairy-tale characters in a 3D world environment. Apart from engaging in conversation, the characters are able to perform physical actions in this simulated world. The goal is to create believable fairy-tale characters with ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The NICE fairy-tale game system allows users to interact with conversational fairy-tale characters in a 3D world environment. Apart from engaging in conversation, the characters are able to perform physical actions in this simulated world. The goal is to create believable fairy-tale characters with distinct personalities. The personality of the characters will be conveyed by their appearance, their voices, how they express themselves and what they are doing. This paper describes the requirements a fairy-tale game domain poses on a spoken output generation system. The implementation of a unit selection synthesizer that meets these requirements is also described.

