Results 1  10
of
148
Recovering 3D Human Pose from Monocular Images
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descrip ..."
Abstract

Cited by 261 (0 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4–5 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem.
StyleBased Inverse Kinematics
, 2004
"... This paper presents an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is repres ..."
Abstract

Cited by 211 (8 self)
 Add to MetaCart
This paper presents an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is represented as a probability distribution over the space of all possible poses. This means that our IK system can generate any pose, but prefers poses that are most similar to the space of poses in the training data. We represent the probability with a novel model called a Scaled Gaussian Process Latent Variable Model. The parameters of the model are all learned automatically; no manual tuning is required for the learning component of the system. We additionally describe a novel procedure for interpolating between styles. Our stylebased
3d people tracking with gaussian process dynamical models
 In CVPR
, 2006
"... We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. W ..."
Abstract

Cited by 201 (18 self)
 Add to MetaCart
(Show Context)
We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. With Bayesian model averaging a GPDM can be learned from relatively small amounts of data, and it generalizes gracefully to motions outside the training set. Here we modify the GPDM to permit learning from motions with significant stylistic variation. The resulting priors are effective for tracking a range of human walking styles, despite weak and noisy image measurements and significant occlusions. 1.
3D Human Pose from Silhouettes by Relevance Vector Regression
 In CVPR
, 2004
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descript ..."
Abstract

Cited by 199 (8 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. For the main regression, we evaluate both regularized least squares and Relevance Vector Machine (RVM) regressors over both linear and kernel bases. The RVM’s provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6–7 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem. 1.
Inferring 3d body pose from silhouettes using activity manifold learning
 In CVPR
, 2004
"... We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration ( ..."
Abstract

Cited by 193 (12 self)
 Add to MetaCart
(Show Context)
We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration (pose) from the visual input (silhouette), we explicitly learn viewbased representations of activity manifolds as well as learn mapping functions between such central representations and both the visual input space and the 3D body pose space. The body pose can be recovered in a closed form in two steps by projecting the visual input to the learned representations of the activity manifold, i.e., finding the point on the learned manifold representation corresponding to the visual input, followed by interpolating 3D pose. 1.
Gaussian process dynamical models for human motion
 IEEE TRANS. PATTERN ANAL. MACHINE INTELL
, 2008
"... We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
(Show Context)
We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.
Covariance scaled sampling for monocular 3D body tracking
 CVPR
, 2001
"... We present a method for recovering 3D human body motion from monocular video sequences using robust image matching, joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D body tracking is challenging: ..."
Abstract

Cited by 156 (3 self)
 Add to MetaCart
(Show Context)
We present a method for recovering 3D human body motion from monocular video sequences using robust image matching, joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D body tracking is challenging: for reliable tracking at least 30 joint parameters need to be estimated, subject to highly nonlinear physical constraints; the problem is chronically illconditioned as about 1/3 of the d.o.f. (the depthrelated ones) are almost unobservable in any given monocular image; and matching an imperfect, highly flexible, selfoccluding model to cluttered image features is intrinsically hard. To reduce correspondence ambiguities we use a carefully designed robust matchingcost metric that combines robust optical flow, edge energy, and motion boundaries. Even so, the ambiguity, nonlinearity and nonobservability make the parameterspace cost surface multimodal, unpredictable and illconditioned, so minimizing it is difficult. We discuss the limitations of CONDENSATIONlike samplers, and introduce a novel hybrid search algorithm that combines inflatedcovariancescaled sampling and continuous optimization subject to physical constraints. Experiments on some challenging monocular sequences show that robust cost modelling, joint and selfintersection constraints, and informed sampling are all essential for reliable monocular 3D body tracking.
Learning Switching Linear Models of Human Motion
, 2000
"... The human figure exhibits complex and rich dynamic behavior that is both nonlinear and timevarying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and v ..."
Abstract

Cited by 136 (2 self)
 Add to MetaCart
The human figure exhibits complex and rich dynamic behavior that is both nonlinear and timevarying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and visual tracking using learned SLDS models. Since exact inference in SLDS is intractable, we present three approximate inference algorithms and compare their performance. In particular, a new variational inference algorithm is obtained by casting the SLDS model as a Dynamic Bayesian Network. Classification experiments show the superiority of SLDS over conventional HMM's for our problem domain. 1 Introduction The human figure exhibits complex and rich dynamic behavior. Dynamics are essential to the classification of human motion (e.g. gesture recognition) as well as to the synthesis of realistic figure motion for computer graphics. In visual tracking applications, dynamics can provide a p...
Performance Animation from Lowdimensional Control Signals
 ACM Transactions on Graphics
, 2005
"... This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplement ..."
Abstract

Cited by 129 (18 self)
 Add to MetaCart
This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplemented by a database of prerecorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a fullbody animation. We demonstrate the power of this approach with realtime control of six different behaviors using two video cameras and a small set of retroreflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.