#### DMCA

## Voice puppetry (1999)

### Cached

### Download Links

- [www.dgp.toronto.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 296 - 0 self |

### Citations

630 |
An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process
- Baum
- 1972
(Show Context)
Citation Context ...ly observe the states; we must infer them from the signal. For this we have the Viterbi algorithm [13], which identifies the most likely state sequence given a signal. The related BaumWelch algorithm =-=[2] e-=-stimates parameter values, given training data and a prior specification of the model’s finite state machine. Both algorithms are based on dynamic programming and give locally optimal results in lin... |

323 |
Video Rewrite: Driving Visual Speech with Audio,” In
- Bregler, Covell, et al.
- 1997
(Show Context)
Citation Context ...y all lip-syncing systems are based on an intermediate phonemic representation, whether obtained by hand [23, 24], from text [9, 12, 1, 17] or, with varying degrees of success, via speech recognition =-=[18, 28, 7, 8, 29]-=-. Typically, phonemic or visemic tokens are mapped directly to lip poses, ignoring dynamical factors. Efforts toward dynamical realism have been heuristic and use limited contextual information (e.g.,... |

51 | Pattern discovery via entropy minimization
- Brand
- 1999
(Show Context)
Citation Context ...y express the content of a frame as a mixture of states, making it impossible to say that the system was in any one state. We briefly review the entropic training framework here, and refer readers to =-=[5, 4] f-=-or details and derivations. We begin with a dataset X and a model whose parameters and structure are specified by the vector θ. In conventional training, one guesses the sparsity structure of θ in a... |

22 |
Structure discovery in conditional probability models via an entropic prior and parameter extinction
- Brand
- 1997
(Show Context)
Citation Context ...y express the content of a frame as a mixture of states, making it impossible to say that the system was in any one state. We briefly review the entropic training framework here, and refer readers to =-=[5, 4] f-=-or details and derivations. We begin with a dataset X and a model whose parameters and structure are specified by the vector θ. In conventional training, one guesses the sparsity structure of θ in a... |

6 |
Spoken language processing in the persona conversational assistant
- Ball, Ling
- 1995
(Show Context)
Citation Context ... lip-syncing has been the focus of many attempts at quasi-automation. Nearly all lip-syncing systems are based on an intermediate phonemic representation, whether obtained by hand [23, 24], from text =-=[9, 12, 1, 17]-=- or, with varying degrees of success, via speech recognition [18, 28, 7, 8, 29]. Typically, phonemic or visemic tokens are mapped directly to lip poses, ignoring dynamical factors. Efforts toward dyna... |

2 |
Read my lips: Where? How? When? And so... What
- Benoit, Abry, et al.
- 1995
(Show Context)
Citation Context ...llers alike have observed that there is a good deal of mutual information between vocal and facial gesture [27]. Facial information can add significantly to the observer’s comprehension of the forma=-=l [3]-=- and emotional content of speech, and is considered by some a necessary ingredient of successful speech-based interfaces. Conversely, the difficulty of synthesizing believable faces is a widely-noted ... |

1 |
puppetry. Submitted to Int
- Shadow
- 1999
(Show Context)
Citation Context ...irtually banished with entropically estimated models because entropy minimization concentrates the probability mass on the optimal Viterbi sequence. (see §5, paragraph 2 for an empirical example). In=-= [6]-=- we present a full Bayesian MAP solution which considers all possible state sequences and show that it and the maximum likelihood solution are only valid for low-entropy models, where they give almost... |

1 |
Audio-visual interaction in nultimedia communication
- Chen, Rao
- 1997
(Show Context)
Citation Context ...y all lip-syncing systems are based on an intermediate phonemic representation, whether obtained by hand [23, 24], from text [9, 12, 1, 17] or, with varying degrees of success, via speech recognition =-=[18, 28, 7, 8, 29]-=-. Typically, phonemic or visemic tokens are mapped directly to lip poses, ignoring dynamical factors. Efforts toward dynamical realism have been heuristic and use limited contextual information (e.g.,... |