Results 1 - 10
of
109
Beat: The behavior expression animation toolkit
, 2001
"... The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation ..."
Abstract
-
Cited by 174 (16 self)
- Add to MetaCart
The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems. The nonverbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from extensive research into human conversational behavior. The toolkit is extensible, so that new rules can be quickly added. It is designed to plug into larger systems that may also assign personality profiles, motion characteristics, scene constraints, or the animation styles of particular animators.
A Survey of Socially Interactive Robots
, 2002
"... This paper reviews "socially interactive robots": robots for which social human-robot interaction is important. We begin by discussing the context for socially interactive robots, emphasizing the relationship to other research fields and the di#erent forms of "social robots". We then present a taxon ..."
Abstract
-
Cited by 154 (24 self)
- Add to MetaCart
This paper reviews "socially interactive robots": robots for which social human-robot interaction is important. We begin by discussing the context for socially interactive robots, emphasizing the relationship to other research fields and the di#erent forms of "social robots". We then present a taxonomy of design methods and system components used to build socially interactive robots. Finally, we describe the impact of these these robots on humans and discuss open issues. An expanded version of this paper, which contains a survey and taxonomy of current applications, is available as a technical report[61].
Conversational Interfaces: Advances and Challenges
, 2000
"... The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the developme ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the development of such interfaces, describe the recent work done in this area at the MIT Laboratory for Computer Science, and outline some of the unmet research challenges, including the need to work in real domains, spoken language generation, and portability across domains and languages.
CUAVE: A new audio-visual database for multimodal human-computer interface research
- In Proc. ICASSP
, 2002
"... Multimodal signal processing has become an important topic of research for overcoming certain problems of audio-only speech processing. Audio-visual speech recognition is one area with great potential. Difficulties due to background noise and multiple speakers are significantly reduced by the additi ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Multimodal signal processing has become an important topic of research for overcoming certain problems of audio-only speech processing. Audio-visual speech recognition is one area with great potential. Difficulties due to background noise and multiple speakers are significantly reduced by the additional information provided by extra visual features. Despite a few efforts to create databases in this area, none has emerged as a standard for comparison for several possible reasons. This paper seeks to introduce a new audiovisual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The CUAVE database is a speaker-independent corpus of over 7,000 utterances of both connected and isolated digits. It is designed to meet several goals that are discussed in this paper. The most notable are availability of the database, flexibility for use of
Recent Developments In Facial Animation: An Inside View
, 1998
"... We report on our recent facial animation work to improve the realism and accuracy of visual speech synthesis. The general approach is to use both static and dynamic observations of natural speech to guide the facial modeling. One current goal is to model the internal articulators of a highly realist ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
We report on our recent facial animation work to improve the realism and accuracy of visual speech synthesis. The general approach is to use both static and dynamic observations of natural speech to guide the facial modeling. One current goal is to model the internal articulators of a highly realistic palate, teeth, and an improved tongue. Because our talking head can be made transparent, we can provide an anatomically valid and pedagogically useful display that can be used in speech training of children with hearing loss [1]. High-resolution models of palate and teeth [2] were reduced to a relatively small number of polygons for real-time animation [3]. For the improved tongue, we are using 3D ultrasound data and electropalatography (EPG) [4] with error minimization algorithms to educate our parametric B-spline based tongue model to simulate realistic speech. In addition, a high-speed algorithm has been developed for detection and correction of collisions, to prevent the tongue from p...
Developing and Evaluating Conversational Agents
, 2000
"... Conversation agents present a challenging agenda for research and application. We describe the development, evaluation, and application of Baldi, a computer animated talking head. Baldi's existence is justified by the important contribution of the face in spoken dialog. His actions are evaluate ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Conversation agents present a challenging agenda for research and application. We describe the development, evaluation, and application of Baldi, a computer animated talking head. Baldi's existence is justified by the important contribution of the face in spoken dialog. His actions are evaluated and modified to mimic natural actions as much as possible. Baldi has the potential to enrich human-machine interactions and serve as a tutor in a wide variety of educational domains. We describe one current application of language tutoring with children with hearing loss. Embodied Characters The title of this conference is "Embodied Conversational Characters." Why not just "Conversational Characters?" What does embodiment add to our quest for a simulacrum of some human agent? Traditionally, the success of artificial intelligence was deemed to be contingent on creating a thinking machine, encompassing all of the rationality, logic, and abstract knowledge possessed by humans. The achie...
Moving-talker speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus
- EURASIP Journal on Applied Signal Processing
, 2002
"... Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming cert ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties due to background noise and multiple speakers in an application environment are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7,000 utterances. It contains a wide variety of speakers, and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. For information on obtaining CUAVE, please visit our webpage
Experimental assessment of the effectiveness of synthetic personae for multi-modal E-retail applications
- In Proceedings 4th International Conference on Autonomous Agents (Agents’2000
, 2000
"... This paper details results of an experiment to empirically evaluate the effectiveness and user acceptability of human-like synthetic agents in a multi-modal electronic retail scenario. The synthetic personae played the roles of interactive conversational sales assistants. The range of life-like pers ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
This paper details results of an experiment to empirically evaluate the effectiveness and user acceptability of human-like synthetic agents in a multi-modal electronic retail scenario. The synthetic personae played the roles of interactive conversational sales assistants. The range of life-like personae differed with respect to gender and technology. Participants took part in the controlled experiment, which involved them eavesdropping on spoken dialogues between a customer and each of the synthetic personae. They also completed questionnaires and took part in a debriefing interview designed to elicit information relating to the effectiveness, believability and perceived quality of each of the personae. Results show that participants expected a high level of realistic and human-like verbal and non-verbal communicative behaviour in the synthetic personae. This was demonstrated in the strong preference for personae that exhibited natural facial expressions, gestures and emotions. It was also found that disembodied voices were significantly preferred to many of the personae. In addition, results show participants had significantly different attitudes to the voices of the personae.
Perceptive animated interfaces: First steps toward a new paradigm for human-computer interaction
- Proceedings of the IEEE
, 2003
"... Click here to download paper in PDF format This article presents a vision of the near future in which computer interaction is characterized by natural face-toface conversations with lifelike characters that speak, emote and gesture. These animated agents will converse with people much like people co ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Click here to download paper in PDF format This article presents a vision of the near future in which computer interaction is characterized by natural face-toface conversations with lifelike characters that speak, emote and gesture. These animated agents will converse with people much like people converse effectively with assistants in a variety of focused applications. Despite the research advances required to realize this vision, and the lack of strong experimental evidence that animated agents improve human computer interaction, we argue that initial prototypes of perceptive animated interfaces can be developed today, and that the resulting systems will provide more effective and engaging communication experiences than existing systems. In support of this hypothesis, we first describe initial experiments using an animated character to teach speech and language skills to children with hearing problems, and classroom subject and social skills to children with autistic spectrum disorder. We then show how existing dialogue system architectures can be transformed into perceptive animated interfaces by integrating computer vision and animation capabilities. We conclude by describing the Colorado Literacy Tutor, a computer-based literacy program that provides an ideal test bed for research and development of perceptive animated interfaces, and consider next steps required to realize the vision.
Perceptual learning in speech
- COGNITIVE PSYCHOLOGY
, 2002
"... This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listener ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WI WItlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?], unambiguous witlof). Listeners who had heard [?] in [f]-final words were subsequently more likely to categorize ambiguous sounds on an [f]–[s] continuum as [f] than those who heard [?] in [s]-final words. Control conditions ruled out alternative explanations based on selective adaptation and contrast. Lexical information can thus be used to train categorization of speech. This use of lexical information differs from the on-line lexical feedback embodied in interactive models of speech perception. In contrast to online feedback, lexical feedback for learning is of benefit to spoken word recognition (e.g., in

