• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

S.Sclaroff: Learning body pose via specialized maps (2002)

by R Rosales
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 24
Next 10 →

Discriminative Density Propagation for 3D Human Motion Estimation

by Cristian Sminchisescu, Atul Kanaujia, Zhiguo Li, Dimitris Metaxas - In CVPR , 2005
"... We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predi ..."
Abstract - Cited by 65 (10 self) - Add to MetaCart
We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capture-based test sequences and compare against nearest-neighbor and regression methods.

Multimodal human computer interaction: A survey

by Alejandro Jaimes, Nicu Sebe , 2005
"... In this paper we review the major approaches to Multimodal Human Computer Interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user ..."
Abstract - Cited by 38 (2 self) - Add to MetaCart
In this paper we review the major approaches to Multimodal Human Computer Interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

Monocular Human Motion Capture with a Mixture of Regressors

by Ankur Agarwal, Bill Triggs - IEEE Workshop on Vision for Human-Computer Interaction , 2005
"... We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible p ..."
Abstract - Cited by 25 (1 self) - Add to MetaCart
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multi-valuedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamicalmodel based tracker that automatically detects tracking failures and re-initializes in a probabilistically correct manner. The system is trained on conventional motion capture data, using both the corresponding real human silhouettes and silhouettes synthesized artificially from several different models for improved robustness to inter-person variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic re-initializations. 1.

Recovering 3D Human Body Configurations Using Shape Contexts

by Greg Mori, Jitendra Malik - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2006
"... The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body ..."
Abstract - Cited by 23 (1 self) - Add to MetaCart
The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently—tracking just becomes repeated recognition. We present results on a variety of data sets.

Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction

by Atul Kanaujia
"... Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representation ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for taskinvariance – resistance to background clutter and within the same human pose class variance. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images. 1.

Twin Gaussian Processes for Structured Prediction

by Liefeng Bo, Cristian Sminchisescu Sminchisescu , 2010
"... ... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examp ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.

Nonparametric Density Estimation with Adaptive Anisotropic Kernels for Human Motion Tracking

by Thomas Brox, Bodo Rosenhahn, Daniel Cremers, Hans-peter Seidel - Proc. Second Workshop Human Motion , 2007
"... Abstract. In this paper, we suggest to model priors on human motion by means of nonparametric kernel densities. Kernel densities avoid assumptions on the shape of the underlying distribution and let the data speak for themselves. In general, kernel density estimators suffer from the problem known as ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Abstract. In this paper, we suggest to model priors on human motion by means of nonparametric kernel densities. Kernel densities avoid assumptions on the shape of the underlying distribution and let the data speak for themselves. In general, kernel density estimators suffer from the problem known as the curse of dimensionality, i.e., the amount of data required to cover the whole input space grows exponentially with the dimension of this space. In many applications, such as human motion tracking, though, this problem turns out to be less severe, since the relevant data concentrate in a much smaller subspace than the original high-dimensional space. As we demonstrate in this paper, the concentration of human motion data on lower-dimensional manifolds, approves kernel density estimation as a transparent tool that is able to model priors on arbitrary mixtures of human motions. Further, we propose to support the ability of kernel estimators to capture distributions on lowdimensional manifolds by replacing the standard isotropic kernel by an adaptive, anisotropic one. 1

Conditional Visual Tracking in Kernel Space

by Cristian Sminchisescu, Atul Kanujia, Zhiguo Li, Dimitris Metaxas - In Proc. NIPS , 2005
"... We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efficiency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our me ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efficiency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our methodology (kBME) combines kernel PCA-based non-linear dimensionality reduction (kPCA) and Conditional Bayesian Mixture of Experts (BME) in order to learn complex multivalued predictors between observations and model hidden states. This is necessary for accurate, inverse, visual perception inferences, where several probable, distant 3D solutions exist due to noise or the uncertainty of monocular perspective projection. Low-dimensional models are appropriate because many visual processes exhibit strong non-linear correlations in both the image observations and the target, hidden state variables. The learned predictors are temporally combined within a conditional graphical model in order to allow a principled propagation of uncertainty. We study several predictors and empirically show that the proposed algorithm positively compares with techniques based on regression, Kernel Dependency Estimation (KDE) or PCA alone, and gives results competitive to those of high-dimensional mixture predictors at a fraction of their computational cost. We show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences.

BM 3 E: Discriminative Density Propagation for Visual Tracking

by Cristian Sminchisescu, Cristian Sminchisescu, Atul Kanaujia, Dimitris N. Metaxas - In IEEE Transactions on Pattern Analysis and Machine Intelligence , 2007
"... We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations – typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continu-ous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, condi-tional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based se-quences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods.

Regression-based human motion capture from voxel data

by Y. Sun, M. Bray, A. Thayananthan, B. Yuan, P. H. S. Torr - BMVC , 2006
"... A regression based method is proposed to recover human body pose from 3D voxel data. In order to do this we need to convert the voxel data into a feature vector. This is done using a Bayesian approach based on Mixture of Probabilistic PCA that transforms a collection of 3D shape context descriptors, ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
A regression based method is proposed to recover human body pose from 3D voxel data. In order to do this we need to convert the voxel data into a feature vector. This is done using a Bayesian approach based on Mixture of Probabilistic PCA that transforms a collection of 3D shape context descriptors, extracted from the voxels, to a compact feature vector. For the regression, the newly-proposed Multi-Variate Relevance Vector Machine is explored to learn a single mapping from this feature vector to a low-dimensional representation of full body pose. We demonstrate the effectiveness and robustness of our method with experiments on both synthetic data and real sequences.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University