Results 1  10
of
137
Recovering 3D Human Pose from Monocular Images
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descrip ..."
Abstract

Cited by 261 (0 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4–5 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem.
MCMCbased particle filtering for tracking a variable number of interacting targets
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... We describe a particle filter that effectively deals with interacting targets targets that are influenced by the proximity and/or behavior of other targets. The particle filter includes a Markov random field (MRF) motion prior that helps maintain the identity of targets throughout an interaction, s ..."
Abstract

Cited by 204 (6 self)
 Add to MetaCart
(Show Context)
We describe a particle filter that effectively deals with interacting targets targets that are influenced by the proximity and/or behavior of other targets. The particle filter includes a Markov random field (MRF) motion prior that helps maintain the identity of targets throughout an interaction, significantly reducing tracker failures. We show that this MRF prior can be easily implemented by including an additional interaction factor in the importance weights of the particle filter. However, the computational requirements of the resulting multitarget filter render it unusable for large numbers of targets. Consequently, we replace the traditional importance sampling step in the particle filter with a novel Markov chain Monte Carlo (MCMC) sampling step to obtain a more efficient MCMCbased multitarget filter. We also show how to extend this MCMCbased filter to address a variable number of interacting targets. Finally, we present both qualitative and quantitative experimental results, demonstrating that the resulting particle filters deal efficiently and effectively with complicated target interactions.
3D Human Pose from Silhouettes by Relevance Vector Regression
 In CVPR
, 2004
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descript ..."
Abstract

Cited by 199 (8 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. For the main regression, we evaluate both regularized least squares and Relevance Vector Machine (RVM) regressors over both linear and kernel bases. The RVM’s provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6–7 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem. 1.
Estimating Articulated Human Motion With Covariance Scaled Sampling
 International Journal of Robotics Research
, 2003
"... We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D ..."
Abstract

Cited by 125 (10 self)
 Add to MetaCart
(Show Context)
We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D body tracking is challenging: besides the difficulty of matching an imperfect, highly flexible, selfoccluding model to cluttered image features, realistic body models have at least 30 joint parameters subject to highly nonlinear physical constraints, and at least a third of these degrees of freedom are nearly unobservable in any given monocular image. For image matching we use a carefully designed robust cost metric combining robust optical flow, edge energy, and motion boundaries. The nonlinearities and matching ambiguities make the parameterspace cost surface multimodal, illconditioned and highly nonlinear, so searching it is difficult. We discuss the limitations of CONDENSATIONlike samplers, and describe a novel hybrid search algorithm that combines inflatedcovariancescaled sampling and robust continuous optimization subject to physical constraints and model priors. Our experiments on challenging monocular sequences show that robust cost modeling, joint and selfintersection constraints, and informed sampling are all essential for reliable monocular 3D motion estimation.
Discriminative Density Propagation for 3D Human Motion Estimation
 In CVPR
, 2005
"... We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predi ..."
Abstract

Cited by 113 (16 self)
 Add to MetaCart
(Show Context)
We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture database and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capturebased test sequences and compare against nearestneighbor and regression methods.
Fast multiple object tracking via a hierarchical particle filter
 In The IEEE International Conference on Computer Vision (ICCV
, 2005
"... A very efficient and robust visual object tracking algorithm based on the particle filter is presented. The method characterizes the tracked objects using color and edge orientation histogram features. While the use of more features and samples can improve the robustness, the computational load re ..."
Abstract

Cited by 91 (4 self)
 Add to MetaCart
(Show Context)
A very efficient and robust visual object tracking algorithm based on the particle filter is presented. The method characterizes the tracked objects using color and edge orientation histogram features. While the use of more features and samples can improve the robustness, the computational load required by the particle filter increases. To accelerate the algorithm while retaining robustness we adopt several enhancements in the algorithm. The first is the use of integral images [34] for efficiently computing the color features and edge orientation histograms, which allows a large amount of particles and a better description of the targets. Next, the observation likelihood based on multiple features is computed in a coarsetofine manner, which allows the computation to quickly focus on the more promising regions. Quasirandom sampling of the particles allows the filter to achieve a higher convergence rate. The resulting tracking algorithm maintains multiple hypotheses and offers robustness against clutter or short period occlusions. Experimental results demonstrate the efficiency and effectiveness of the algorithm for single and multiple object tracking. 1
Twin Gaussian Processes for Structured Prediction
, 2010
"... ... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the KullbackLeibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examp ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the KullbackLeibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no handcrafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.
Monocular Human Motion Capture with a Mixture of Regressors
 IEEE Workshop on Vision for HumanComputer Interaction
, 2005
"... We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible p ..."
Abstract

Cited by 54 (3 self)
 Add to MetaCart
(Show Context)
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multivaluedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamicalmodel based tracker that automatically detects tracking failures and reinitializes in a probabilistically correct manner. The system is trained on conventional motion capture data, using both the corresponding real human silhouettes and silhouettes synthesized artificially from several different models for improved robustness to interperson variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic reinitializations. 1.
Efficient meanshift tracking via a new similarity measure
 in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05
, 2005
"... The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are th ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
(Show Context)
The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are the Bhattacharyya coefficient or the KullbackLeibler divergence. In practice, these approaches face three difficulties. First, the spatial information of the target is lost when the color histogram is employed, which precludes the application of more elaborate motion models. Second, the classical similarity measures are not very discriminative. Third, the samplebased classical similarity measures require a calculation that is quadratic in the number of samples, making realtime performance difficult. To deal with these difficulties we propose a new, simpletocompute and more discriminative similarity measure in spatialfeature spaces. The new similarity measure allows the mean shift algorithm to track more general motion models in an integrated way. To reduce the complexity of the computation to linear order we employ the recently proposed improved fast Gauss transform. This leads to a very efficient and robust nonparametric spatialfeature tracking algorithm. The algorithm is tested on several image sequences and shown to achieve robust and reliable framerate tracking.
3D human body tracking using deterministic temporal motion models
 In ECCV
, 2004
"... Abstract. There has been much effort invested in increasing the robustness of human body tracking by incorporating motion models. Most approaches are probabilistic in nature and seek to avoid becoming trapped into local minima by considering multiple hypotheses, which typically requires exponentiall ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
(Show Context)
Abstract. There has been much effort invested in increasing the robustness of human body tracking by incorporating motion models. Most approaches are probabilistic in nature and seek to avoid becoming trapped into local minima by considering multiple hypotheses, which typically requires exponentially large amounts of computation as the number of degrees of freedom increases. By contrast, in this paper, we use temporal motion models based on Principal Component Analysis to formulate the tracking problem as one of minimizing differentiable objective functions. The differential structure of these functions is rich enough to yield good convergence properties using a deterministic optimization scheme at a much reduced computational cost. Furthermore, by using a multiactivity database, we can partially overcome one of the major limitations of approaches that rely on motion models, namely the fact they are limited to one single type of motion. We will demonstrate the effectiveness of the proposed approach by using it to fit fullbody models to stereo data of people walking and running and whose quality is too low to yield satisfactory results without motion models. 1