Results 1 - 10
of
139
Image analogies
, 2001
"... Figure 1 An image analogy. Our problem is to compute a new “analogous ” image B ′ that relates to B in “the same way ” as A ′ relates to A. Here, A, A ′ , and B are inputs to our algorithm, and B ′ is the output. The full-size images are shown in Figures 10 and 11. This paper describes a new framewo ..."
Abstract
-
Cited by 282 (8 self)
- Add to MetaCart
Figure 1 An image analogy. Our problem is to compute a new “analogous ” image B ′ that relates to B in “the same way ” as A ′ relates to A. Here, A, A ′ , and B are inputs to our algorithm, and B ′ is the output. The full-size images are shown in Figures 10 and 11. This paper describes a new framework for processing images by example, called “image analogies. ” The framework involves two stages: a design phase, in which a pair of images, with one image purported to be a “filtered ” version of the other, is presented as “training data”; and an application phase, in which the learned filter is applied to some new target image in order to create an “analogous” filtered result. Image analogies are based on a simple multiscale autoregression, inspired primarily by recent results in texture synthesis. By choosing different types of source image pairs as input, the framework supports a wide variety of “image filter ” effects, including traditional image filters, such as blurring or embossing; improved texture synthesis, in which some textures are synthesized with higher quality than by previous approaches; super-resolution, in which a higher-resolution image is inferred from a low-resolution source; texture transfer, in which images are “texturized ” with some arbitrary source texture; artistic filters, in which various drawing and painting styles are synthesized based on scanned real-world examples; and texture-by-numbers, in which realistic scenes, composed of a variety of textures, are created using a simple painting interface.
Voice puppetry
, 1999
"... Frames from a voice-driven animation, computed from a single baby picture and an adult model of facial control. Note the changes in upper facial expression. See figures 5, 6 and 7 for more examples of predicted mouth shapes. We introduce a method for predicting a control signal from another related ..."
Abstract
-
Cited by 190 (0 self)
- Add to MetaCart
Frames from a voice-driven animation, computed from a single baby picture and an adult model of facial control. Note the changes in upper facial expression. See figures 5, 6 and 7 for more examples of predicted mouth shapes. We introduce a method for predicting a control signal from another related signal, and apply it to voice puppetry: Generating full facial animation from expressive information in an audio track. The voice puppet learns a facial control model from computer vision of real facial behavior, automatically incorporating vocal and facial dynamics such as co-articulation. Animation is produced by using audio to drive the model, which induces a probability distribution over the manifold of possible facial motions. We present a lineartime closed-form solution for the most probable trajectory over this manifold. The output is a series of facial control parameters, suitable for driving many different kinds of animation ranging from video-realistic image warps to 3D cartoon characters.
Synthesizing Realistic Facial Expressions from Photographs
"... We present new techniques for creating photorealistic textured 3D facial models from photographs of a human subject, and for creating smooth transitions between different facial expressions by morphing between these different models. Starting from several uncalibrated views of a human subject, we em ..."
Abstract
-
Cited by 186 (10 self)
- Add to MetaCart
We present new techniques for creating photorealistic textured 3D facial models from photographs of a human subject, and for creating smooth transitions between different facial expressions by morphing between these different models. Starting from several uncalibrated views of a human subject, we employ a user-assisted technique to recover the camera poses corresponding to the views as well as the 3D coordinates of a sparse set of chosen locations on the subject's face. A scattered data interpolation technique is then used to deform a generic face mesh to fit the particular geometry of the subject's face. Having recovered the camera poses and the facial geometry, we extract from the input images one or more texture maps for the model. This process is repeated for several facial expressions of a particular subject. To generate transitions between these facial expressions we use 3D shape morphing between the corresponding face models, while at the same time blending the corresponding tex...
Video Textures
, 2000
"... This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, th ..."
Abstract
-
Cited by 180 (9 self)
- Add to MetaCart
This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce videobased animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for sp...
Beat: The behavior expression animation toolkit
, 2001
"... The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation ..."
Abstract
-
Cited by 174 (16 self)
- Add to MetaCart
The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems. The nonverbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from extensive research into human conversational behavior. The toolkit is extensible, so that new rules can be quickly added. It is designed to plug into larger systems that may also assign personality profiles, motion characteristics, scene constraints, or the animation styles of particular animators.
Making Faces
, 1998
"... We have created a system for capturing both the three-dimensional geometry and color and shading information for human facial expressions. We use this data to reconstruct photorealistic, 3D animations of the captured expressions. The system uses a large set of sampling points on the face to accurate ..."
Abstract
-
Cited by 129 (2 self)
- Add to MetaCart
We have created a system for capturing both the three-dimensional geometry and color and shading information for human facial expressions. We use this data to reconstruct photorealistic, 3D animations of the captured expressions. The system uses a large set of sampling points on the face to accurately track the three dimensional deformations of the face. Simultaneously with the tracking of the geometric data, we capture multiple high resolution, registered video images of the face. These images are used to create a texture map sequence for a three dimensional polygonal face model which can then be rendered on standard 3D graphics hardware. The resulting facial animation is surprisingly life-like and looks very much like the original live performance. Separating the capture of the geometry from the texture images eliminates much of the variance in the image data due to motion, which increases compression ratios. Although the primary emphasis of our work is not compression we have investigated the use of a novel method to compress the geometric data based on principal components analysis. The texture sequence is compressed using an MPEG4 video codec. Animations reconstructed from 512x512 pixel textures look good at data rates as low as 240 Kbits per second.
Trainable Videorealistic Speech Animation
- PROCEEDINGS OF SIGGRAPH 2002, SAN ANTONIO TEXAS
, 2002
"... We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from th ..."
Abstract
-
Cited by 110 (5 self)
- Add to MetaCart
We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key
Face Transfer with Multilinear Models
- TO APPEAR IN SIGGRAPH 2005
, 2005
"... Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate ..."
Abstract
-
Cited by 64 (1 self)
- Add to MetaCart
Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage. The underlying face model automatically adjusts for how the target performs facial expressions and visemes. The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target—the attributes are separably controllable. This supports
Resynthesizing Facial Animation through 3D Model-Based Tracking
, 1999
"... Given video footage of a person's face, we present new techniques to automatically recover the face position and the facial expression from each frame in the video sequence. A 3D face model is fitted to each frame using a continuous optimization technique. Our model is based on a set of 3D face mode ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
Given video footage of a person's face, we present new techniques to automatically recover the face position and the facial expression from each frame in the video sequence. A 3D face model is fitted to each frame using a continuous optimization technique. Our model is based on a set of 3D face models that are linearly combined using 3D morphing. Our method has the advantages over previous techniques of fitting directly a realistic 3-dimensional face model and of recovering parameters that can be used directly in an animation system. We also explore many applications, including performance-driven animation (applying the recovered position and expression of the face to a synthetic character to produce an animation that mimics the input video), relighting the face, varying the camera position, and adding facial ornaments such as tattoos and scars. 1 Introduction There are many techniques and tools that can be used to create facial animations. These tools can be as simple as a pencil an...
Audio-Visual Integration In Multimodal Communication
- Proc. IEEE
, 1998
"... : In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also st ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
: In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also study the enabling technologies for these research topics, including automatic facial feature tracking and audio-to-visual mapping. Recent progress in audio-visual research shows that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently. Keywords: Multimedia communication, Speech processing, Speech communication, Video signal processing, Image analysis 1. Introduction Multimedia is more than simply the combination of various forms of data: text, speech, audio, music, images, graphics, and video. When we discuss multimedia signal processing, it is the integration and interaction among these different media types t...

