## Discriminative Density Propagation for 3D Human Motion Estimation (2005)

### Cached

### Download Links

- [paul.rutgers.edu]
- [www.cs.toronto.edu]
- [sminchisescu.ins.uni-bonn.de]
- DBLP

### Other Repositories/Bibliography

Venue: | In CVPR |

Citations: | 92 - 14 self |

### BibTeX

@INPROCEEDINGS{Sminchisescu05discriminativedensity,

author = {Cristian Sminchisescu and Atul Kanaujia and Zhiguo Li and Dimitris Metaxas},

title = {Discriminative Density Propagation for 3D Human Motion Estimation},

booktitle = {In CVPR},

year = {2005},

pages = {390--397}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capture-based test sequences and compare against nearest-neighbor and regression methods.

### Citations

1251 | Shape matching and object recognition using shape contexts
- Belongie, Malik, et al.
- 2002
(Show Context)
Citation Context ...empirically how ambiguous a sample of our training data is. This is shown and discussed in fig. 3. Our choice of image features is based on previously developed methods for shape and texture modeling =-=[8, 17, 5]-=-. We work with silhouettes and we assume that in real settings these can be obtained using a statistical background subtraction method (we use one based on separately built foreground and background m... |

724 | Hierarchical mixtures of experts and EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...an be applied to solve (c). (2) We describe conditional Bayesian mixture of experts 2 representations 2 that allow flexible discriminative modeling. These are based on hierarchical mixture of experts =-=[14, 29, 6]-=-, an elaborated version of clusterwise or switching regression [9, 18], where the expert mixture proportions (called gates) are themselves observation-sensitive predictors, synchronized across experts... |

553 | Sparse bayesian learning and the relevance vector machine. Journal of machine learning research
- Tipping
- 2001
(Show Context)
Citation Context ...ditionals. Our learning algorithm is different from the one of [29] in that we use sparse greedy approximations, and differs from [6] in that we use type-II maximum likelihood Bayesian approximations =-=[15, 27]-=-, and not structured variational ones. (3) We demonstrate the proposed algorithms on real and motion capture-based test sequences and present comparisons with nearest neighbor and regression methods. ... |

522 | Bayesian interpolation
- MacKay
- 1992
(Show Context)
Citation Context ...ditionals. Our learning algorithm is different from the one of [29] in that we use sparse greedy approximations, and differs from [6] in that we use type-II maximum likelihood Bayesian approximations =-=[15, 27]-=-, and not structured variational ones. (3) We demonstrate the proposed algorithms on real and motion capture-based test sequences and present comparisons with nearest neighbor and regression methods. ... |

439 | Maximum entropy markov models for information extraction and segmentation - McCallum, Freitag, et al. - 2000 |

401 | Articulated Body Motion Capture by Annealed Particle Filtering
- Deutscher, Blake, et al.
- 2000
(Show Context)
Citation Context ...ng. Bayes’ rule is then used to compute the state conditional from the observation conditional and the state prior. Learning can be both supervised and unsupervised. This includes priors on the state =-=[10, 12, 21]-=-, dimensionality reduction [22] or estimating the parameters of the observation model (e.g. texture, ridge or edge distributions) using problem-dependent, natural image statistics [19]. Temporal infer... |

210 | Nonparametric belief propagation
- Sudderth, Ihler, et al.
- 2003
(Show Context)
Citation Context ...istributions) using problem-dependent, natural image statistics [19]. Temporal inference (tracking) is framed in a clear probabilistic and computational framework based on mixture or particle filters =-=[13, 10, 25, 21, 26]-=-. It has been argued that generative models can flexibly reconstruct complex unknown motions and can naturally handle problem constraints. It has been counter-argued that both flexibility and modeling... |

152 | 3D human pose from silhouettes by relevance vector regression
- Agarwal, Triggs
- 2004
(Show Context)
Citation Context ...t indirect with respect to the task, that requires conditional state estimation and not conditional observation modeling. These arguments motivate the complementary study of discriminative algorithms =-=[7, 17, 20, 18, 2]-=- that model and predict the state conditional directly in order to simplify inference. Prediction however involves missing (state) data, unlike learning that is supervised. But learning is also diffic... |

146 | Estimating human body configurations using shape context matching
- Mori, Malik
- 2002
(Show Context)
Citation Context ...t indirect with respect to the task, that requires conditional state estimation and not conditional observation modeling. These arguments motivate the complementary study of discriminative algorithms =-=[7, 17, 20, 18, 2]-=- that model and predict the state conditional directly in order to simplify inference. Prediction however involves missing (state) data, unlike learning that is supervised. But learning is also diffic... |

144 | Tracking loose-limbed people - Sigal, Bhatia, et al. |

133 | Estimating 3d hand pose from a cluttered image
- Athitsos, Sclaroff
(Show Context)
Citation Context ...implies that, strictly, the inverse mapping from observations to states is multi-valued and cannot be functionally (and globally) approximated, several authors made initial progress by treating it so =-=[20, 4, 17, 28, 2]-=-. Some approaches constructed data structures for fast nearest-neighbor retrieval [20, 4, 28, 17] or learned regression parameters [2]. Inference involved either indexing for the nearest-neighbors of ... |

131 | Bayesian reconstruction of 3d human motion from single-camera video
- Howe, Leventon, et al.
(Show Context)
Citation Context ...ng. Bayes’ rule is then used to compute the state conditional from the observation conditional and the state prior. Learning can be both supervised and unsupervised. This includes priors on the state =-=[10, 12, 21]-=-, dimensionality reduction [22] or estimating the parameters of the observation model (e.g. texture, ridge or edge distributions) using problem-dependent, natural image statistics [19]. Temporal infer... |

102 | Fast Pose Estimation with Parameter Sensitive Hashing
- Shakhnarovich, Viola, et al.
- 2003
(Show Context)
Citation Context ...t indirect with respect to the task, that requires conditional state estimation and not conditional observation modeling. These arguments motivate the complementary study of discriminative algorithms =-=[7, 17, 20, 18, 2]-=- that model and predict the state conditional directly in order to simplify inference. Prediction however involves missing (state) data, unlike learning that is supervised. But learning is also diffic... |

101 | Kinematic jump processes for monocular 3d human tracking
- Sminchisescu, Sminchisescu, et al.
- 2003
(Show Context)
Citation Context ...lack of observability of some of the d.o.f., e.g. ����¡�� ambiguities in the global azimuthal orientation for frontal views. These are multiplied by intrinsic forward / backward monocular ambiguities =-=[25]-=- that are common in many human interaction scenarios. (While no image descriptor set is likely to easily help discriminate them, this further motivates our probabilistic, multiple hypothesis approach.... |

82 | Inferring 3D structure with a statistical image-based Shape model
- Shakhnarovich, Darrell
- 2003
(Show Context)
Citation Context ...proposal mechanism, e.g. during generative inference based on quadrature-style Monte-Carlo approximations and indeed this is how it has primarily been used [18]. A related method has been proposed by =-=[11]-=-, where a mixture of probabilistic PCA is fitted to the joint distribution of multiview silhouettes and corresponding 3D pose, and reconstruction is based on MAP estimates. In this multi-image setting... |

74 | Generative modeling for continuous non-linearly embedded visual inference
- Sminchisescu, Jepson
- 2004
(Show Context)
Citation Context ... the state conditional from the observation conditional and the state prior. Learning can be both supervised and unsupervised. This includes priors on the state [10, 12, 21], dimensionality reduction =-=[22]-=- or estimating the parameters of the observation model (e.g. texture, ridge or edge distributions) using problem-dependent, natural image statistics [19]. Temporal inference (tracking) is framed in a ... |

60 | T.Robinson. Bayesian methods for mixtures of experts
- Waterhouse, Mackay
- 1996
(Show Context)
Citation Context ...an be applied to solve (c). (2) We describe conditional Bayesian mixture of experts 2 representations 2 that allow flexible discriminative modeling. These are based on hierarchical mixture of experts =-=[14, 29, 6]-=-, an elaborated version of clusterwise or switching regression [9, 18], where the expert mixture proportions (called gates) are themselves observation-sensitive predictors, synchronized across experts... |

46 | 3D texture recognition using bidirectional feature histograms
- Cula, Dana
(Show Context)
Citation Context ...empirically how ambiguous a sample of our training data is. This is shown and discussed in fig. 3. Our choice of image features is based on previously developed methods for shape and texture modeling =-=[8, 17, 5]-=-. We work with silhouettes and we assume that in real settings these can be obtained using a statistical background subtraction method (we use one based on separately built foreground and background m... |

40 | Learning Body Pose Via Specialized Maps - Rosales, Sclaroff - 2002 |

38 |
A maximum likelihood methodology for clusterwise linear regression
- DeSarbo, Cron
- 1988
(Show Context)
Citation Context ...e methods, a notable exception is [18], who clustered their dataset into soft partitions and learned functional approximations (perceptrons) within each. However, clusterwise functional approximation =-=[9, 18]-=- is only going halfway towards a multivalued inversion because inference is not straightforward. For new inputs, cluster membership probabilities cannot be computed as during (supervised) learning, be... |

28 |
3d tracking = classification + interpolation
- Tomasi, Petrov, et al.
- 2003
(Show Context)
Citation Context ...implies that, strictly, the inverse mapping from observations to states is multi-valued and cannot be functionally (and globally) approximated, several authors made initial progress by treating it so =-=[20, 4, 17, 28, 2]-=-. Some approaches constructed data structures for fast nearest-neighbor retrieval [20, 4, 28, 17] or learned regression parameters [2]. Inference involved either indexing for the nearest-neighbors of ... |

19 |
Variational Mixture Smoothing for Non-Linear Dynamical Systems
- Sminchisescu, Jepson
- 2004
(Show Context)
Citation Context ...bly reconstruct complex unknown motions and can naturally handle problem constraints. It has been counter-argued that both flexibility and modeling difficulties lead to expensive, uncertain inference =-=[10, 25, 23, 21]-=-, and that a constructive form of the observer is somewhat indirect with respect to the task, that requires conditional state estimation and not conditional observation modeling. These arguments motiv... |

15 | Gibbs Likelihoods for Bayesian Tracking
- Roth, Sigal, et al.
- 2004
(Show Context)
Citation Context ...e state [10, 12, 21], dimensionality reduction [22] or estimating the parameters of the observation model (e.g. texture, ridge or edge distributions) using problem-dependent, natural image statistics =-=[19]-=-. Temporal inference (tracking) is framed in a clear probabilistic and computational framework based on mixture or particle filters [13, 10, 25, 21, 26]. It has been argued that generative models can ... |

13 |
Optimal pairwise geometric histograms
- Aherne, Thacker, et al.
- 1997
(Show Context)
Citation Context ...text features extracted on the silhouette [5, 17, 2] (5 radial bins, 12 angular bins, with bin size range 1 / 8 to 3 on log scale). We also experiment with pairwise edge angle and distance histograms =-=[3]-=- collected inside the silhouette. The features are computed at a variety of scales and sizes for points sampled on the silhouette. To work in a common coordinate system, we cluster all features in the... |

11 |
Bayesian mixtures of experts
- Bishop, Svensen
- 2003
(Show Context)
Citation Context ...an be applied to solve (c). (2) We describe conditional Bayesian mixture of experts 2 representations 2 that allow flexible discriminative modeling. These are based on hierarchical mixture of experts =-=[14, 29, 6]-=-, an elaborated version of clusterwise or switching regression [9, 18], where the expert mixture proportions (called gates) are themselves observation-sensitive predictors, synchronized across experts... |

10 |
Motion Capture DataBase. Available online at http://mocap.cs.cmu.edu/search.html
- Human
- 2003
(Show Context)
Citation Context ...t we an6 Prediction based on the input only is essential for inference, where membership probabilities (6) cannot be computed because the output is missing. ��� (6) 4 imate using human motion capture =-=[1]-=-. Our human representation (¤ ) is based on an articulated skeleton with spherical joints, and has 56 d.o.f. including global translation. Our database consists of about 3000 samples that involve a va... |

10 | Learning to reconstruct 3D human motion from Bayesian mixtures of experts. A probabilistic discriminative approach
- Sminchisescu, Kanaujia, et al.
- 2004
(Show Context)
Citation Context ...tion is based on MAP estimates. In this multi-image setting the state conditional could be unimodal, but conditional computation requires, in principle, application of Bayes’ rule and marginalization =-=[24]-=-. To summarize, it has been argued that discriminative models provide fast inference and interpolate flexibly in the trained region. But they can fail on novel inputs, especially if trained using smal... |