Results 1 - 10
of
36
Animation Of Talking Agents
- IN PROCEEDINGS OF AVSP'97
, 1997
"... It is envisioned that autonomous software agents that can communicate using speech and gesture will soon be on everybody's computer screen. This paper describes an architecture that can be used to design and animate characters capable of lip-synchronised synthetic speech as well as body gestures, fo ..."
Abstract
-
Cited by 45 (20 self)
- Add to MetaCart
It is envisioned that autonomous software agents that can communicate using speech and gesture will soon be on everybody's computer screen. This paper describes an architecture that can be used to design and animate characters capable of lip-synchronised synthetic speech as well as body gestures, for use in for example spoken dialogue systems. A general scheme for computationally efficient parametric deformation of facial surfaces is presented, as well as techniques for generation of bimodal speech, facial expressions and body gestures in a spoken dialogue system. Results indicating that an animated cartoon-like character can be a significant contribution to speech intelligibility, are also reported.
Developing a 3D-Agent for the August Dialogue System
"... In our continuing work with multimodal text-to-speech synthesis with high quality for speechreading, a new talking head has been developed with the purpose of acting as an interactive agent in a dialogue system, set up in a public exhibition area in downtown Stockholm. The new agent conforms to the ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
In our continuing work with multimodal text-to-speech synthesis with high quality for speechreading, a new talking head has been developed with the purpose of acting as an interactive agent in a dialogue system, set up in a public exhibition area in downtown Stockholm. The new agent conforms to the same set of basic control parameters as our earlier faces, allowing us to control it using existing rules for visual speech synthesis. To add to the realism and believability of the dialogue system, the agent has been given a rich repertoire of extra-linguistic gestures and expressions, including emotional cues, turn-taking signals and prosodic cues such as punctuators and emphasizers. Studies of user reactions indicated that people have a positive attitude towards our new agent.
Real-time Handling of Fragmented Utterances
- in Proceedings of the NAACL Workshop on Adaption in Dialogue Systems
, 2001
"... this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a pause between two fragments, so that the system should wait for more input? Our system incrementally classifies user utterances as either closing (more input is unlikely to come) or non-closing (more input is likely to come), partly depending on the current dialogue state. Utterances that are categorized as non-closing allow the dialogue system to await additional spoken or graphical input before responding
Olga -- a Conversational Agent With Gestures
"... The Olga project has developed an animated agent interface for information services. The interface combines a graphical interface, spoken dialogue and an animated 3D `human-like' character for multimodal interaction with users. The interaction is intelligently managed using techniques derived f ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
The Olga project has developed an animated agent interface for information services. The interface combines a graphical interface, spoken dialogue and an animated 3D `human-like' character for multimodal interaction with users. The interaction is intelligently managed using techniques derived from spoken dialogue but extended for the graphical modality. The Olga agent is innovative in combining an interactive spoken dialogue system with a 3-D animated character using lip-synchronized synthetic speech and gesturing. Particular attention has been paid to ensuring that the behaviour of the agent is immediately comprehensible for the user. Synchronizing
Visual Speech Synthesis Based On Parameter Generation From HMM: Speech-Driven And Text-And-Speech-Driven Approaches
- In ICASSP
, 1998
"... This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HM ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HMMs, namely, syllable HMMs are trained with parameter vector sequences that represent both auditory and visual speech features. Input speech is recognized using the syllable HMMs and converted into a transcription and a state sequence. A sentence HMM is constructed by concatenating the syllable HMMs corresponding to the transcription for the input speech. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. Since the generated parameter sequence reflects statistical information of both static and dynamic features of several phonemes before and after the current phonemes, synthetic lip motion becomes smooth and realistic. We show experimental results...
AdApt - a multimodal conversational dialogue system in an apartment domain
- In Proceedings of ICSLP 2000
, 2000
"... A general overview of the AdApt project and the research that is performed within the project is presented. In this project various aspects of human-computer interaction in a multimodal conversational dialogue systems are investigated. The project will also include studies on the integration of user ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
A general overview of the AdApt project and the research that is performed within the project is presented. In this project various aspects of human-computer interaction in a multimodal conversational dialogue systems are investigated. The project will also include studies on the integration of user/system/dialogue dependent speech recognition and multimodal speech synthesis. A domain in which multimodal interaction is highly useful has been chosen, namely, finding available apartments in Stockholm. A Wizard-of-Oz data collection within this domain is also described. 1.
Trainable articulatory control models for visual speech synthesis
- Journal of Speech Technology
, 2004
"... Abstract. This paper deals with the problem of modelling the dynamics of articulation for a parameterised talking head based on phonetic input. Four different models are implemented and trained to reproduce the articulatory patterns of a real speaker, based on a corpus of optical measurements. Two o ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Abstract. This paper deals with the problem of modelling the dynamics of articulation for a parameterised talking head based on phonetic input. Four different models are implemented and trained to reproduce the articulatory patterns of a real speaker, based on a corpus of optical measurements. Two of the models, (“Cohen-Massaro ” and “Öhman”) are based on coarticulation models from speech production theory and two are based on artificial neural networks, one of which is specially intended for streaming real-time applications. The different models are evaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaro model produces trajectories that best matches the measurements. A perceptual intelligibility experiment is also carried out, where the four data-driven models are compared against a rule-based model as well as an audio-alone condition. Results show that all models give significantly increased speech intelligibility over the audio-alone case, with the rule-based model yielding highest intelligibility score. Keywords: perceptual evaluation
SYNFACE – A Talking Head Telephone for the Hearing-impaired
- In Miesenberger
, 2004
"... Abstract. SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the r ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Abstract. SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started. 1
Olga - A Dialogue System With An Animated Talking Agent
- In Proceedings of Eurospeech '97
, 1997
"... The object of the Olga project is to develop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the alga agent would guide naive users through the various services available on the network. The current application is a consumer information se ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The object of the Olga project is to develop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the alga agent would guide naive users through the various services available on the network. The current application is a consumer information service for microwave ovens. alga required the development of a system with components from many different fields: multimodal interfaces, dialogue management, speech recognition, speech synthesis, graphics, animation, facilities for direct manipulation and database handling. To integrate all knowledge sources alga is implemented with separate modules communicaring with a central dialogue interaction manager. In this paper we mainly describe the talking animated agent and the dialogue manager. There is also a short description of the preliminary speech recogniser used in the project.
Expressive Facial Animation Synthesis by Learning Speech Co-Articulation and Expression
- Space, IEEE Transaction on Visualization and Computer Graphics
, 2006
"... Abstract—Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a hu ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Abstract—Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that “learns ” speech coarticulation models for diphones and triphones from the recorded data. A Phoneme-Independent Expression Eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and Principal Component Analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation. Index Terms—Facial animation, expressive speech, animation synthesis, speech coarticulation, texture synthesis, motion capture, data-driven. 1

