Results 1  10
of
110
3d people tracking with gaussian process dynamical models
 In CVPR
, 2006
"... We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. W ..."
Abstract

Cited by 195 (20 self)
 Add to MetaCart
(Show Context)
We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. With Bayesian model averaging a GPDM can be learned from relatively small amounts of data, and it generalizes gracefully to motions outside the training set. Here we modify the GPDM to permit learning from motions with significant stylistic variation. The resulting priors are effective for tracking a range of human walking styles, despite weak and noisy image measurements and significant occlusions. 1.
Data fusion for visual tracking with particles
 Proc. IEEE
, 2004
"... Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. ..."
Abstract

Cited by 160 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data. Index Terms — Visual tracking, data fusion, particle filters, sound, color, motion I.
Kinematic Jump Processes For Monocular 3D Human Tracking
 In Int. Conf. Computer Vision & Pattern Recognition
, 2003
"... A major difficulty for 3D human body tracking from monocular image sequences is the near nonobservability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict nonobservabilities reduce to twofold ‘forwards/backwards flipping ’ ambiguities ..."
Abstract

Cited by 133 (17 self)
 Add to MetaCart
A major difficulty for 3D human body tracking from monocular image sequences is the near nonobservability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict nonobservabilities reduce to twofold ‘forwards/backwards flipping ’ ambiguities for each link. These imply 2 # links formal inverse kinematics solutions for the full model, and hence linked groups of O(2 # links) local minima in the modelimage matching cost function. Choosing the wrong minimum leads to rapid mistracking, so for reliable tracking, rapid methods of investigating alternative minima within a group are needed. Previous approaches to this have used generic search methods that do not exploit the specific problem structure. Here, we complement these by using simple kinematic reasoning to enumerate the tree of possible forwards/backwards flips, thus greatly speeding the search within each linked group of minima. Our methods can be used either deterministically, or within stochastic ‘jumpdiffusion ’ style search processes. We give experimental results on some challenging monocular human tracking sequences, showing how the new kinematicflipping based sampling method improves and complements existing ones.
Estimating Articulated Human Motion With Covariance Scaled Sampling
 International Journal of Robotics Research
, 2003
"... We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D ..."
Abstract

Cited by 119 (10 self)
 Add to MetaCart
We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and nonselfintersection constraints, and a new sampleandrefine search strategy guided by rescaled costfunction covariances. Monocular 3D body tracking is challenging: besides the difficulty of matching an imperfect, highly flexible, selfoccluding model to cluttered image features, realistic body models have at least 30 joint parameters subject to highly nonlinear physical constraints, and at least a third of these degrees of freedom are nearly unobservable in any given monocular image. For image matching we use a carefully designed robust cost metric combining robust optical flow, edge energy, and motion boundaries. The nonlinearities and matching ambiguities make the parameterspace cost surface multimodal, illconditioned and highly nonlinear, so searching it is difficult. We discuss the limitations of CONDENSATIONlike samplers, and describe a novel hybrid search algorithm that combines inflatedcovariancescaled sampling and robust continuous optimization subject to physical constraints and model priors. Our experiments on challenging monocular sequences show that robust cost modeling, joint and selfintersection constraints, and informed sampling are all essential for reliable monocular 3D motion estimation.
ModelBased Hand Tracking Using A Hierarchical Bayesian Filter
, 2004
"... This thesis focuses on the automatic recovery of threedimensional hand motion from one or more views. A 3D geometric hand model is constructed from truncated cones, cylinders and ellipsoids and is used to generate contours, which can be compared with edge contours and skin colour in images. The han ..."
Abstract

Cited by 98 (3 self)
 Add to MetaCart
(Show Context)
This thesis focuses on the automatic recovery of threedimensional hand motion from one or more views. A 3D geometric hand model is constructed from truncated cones, cylinders and ellipsoids and is used to generate contours, which can be compared with edge contours and skin colour in images. The hand tracking problem is formulated as state estimation, where the model parameters define the internal state, which is to be estimated from image observations. In thew first
Generative Modeling for Continuous NonLinearly Embedded Visual Inference
 In ICML
, 2004
"... Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over highdimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly inte ..."
Abstract

Cited by 90 (12 self)
 Add to MetaCart
(Show Context)
Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over highdimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly integrated in order to reshape an initially designed representation space. Nonlinearities, inherent sparsity of highdimensional training sets, and lack of global continuity makes dimensionality reduction challenging and lowdimensional search inefficient. To address these problems, we present a learning and inference algorithm that restricts visual tracking to automatically extracted, nonlinearly embedded, lowdimensional spaces. This formulation produces a layered generative model with reduced state representation, that can be estimated using efficient continuous optimization methods. Our prior flattening method allows a simple analytic treatment of lowdimensional intrinsic curvature constraints, and allows consistent interpolation operations.
Recovering 3D Human Body Configurations Using Shape Contexts
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... The problem we consider in this paper is to take a single twodimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in threedimensional space. The basic approach is to store a number of exemplar 2D views of the human body ..."
Abstract

Cited by 77 (2 self)
 Add to MetaCart
(Show Context)
The problem we consider in this paper is to take a single twodimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in threedimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chainbased deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently—tracking just becomes repeated recognition. We present results on a variety of data sets.
A Quantitative Evaluation of Videobased 3D Person Tracking
 In: International Workshop on Performance Evaluation of Tracking and Surveillance
, 2005
"... The Bayesian estimation of 3D human motion from video sequences is quantitatively evaluated using synchronized, multicamera, calibrated video and 3D ground truth poses acquired with a commercial motion capture system. While many methods for human pose estimation and tracking have been proposed, to ..."
Abstract

Cited by 60 (6 self)
 Add to MetaCart
(Show Context)
The Bayesian estimation of 3D human motion from video sequences is quantitatively evaluated using synchronized, multicamera, calibrated video and 3D ground truth poses acquired with a commercial motion capture system. While many methods for human pose estimation and tracking have been proposed, to date there has been no quantitative comparison. Our goal is to evaluate how different design choices influence tracking performance. Toward that end, we independently implemented two fairly standard Bayesian person trackers using two variants of particle filtering and propose an evaluation measure appropriate for assessing the quality of probabilistic tracking methods. In the Bayesian framework we compare various image likelihood functions and prior models of human motion that have been proposed in the literature. Our results suggest that in constrained laboratory environments, current methods perform quite well. Multiple cameras and background subtraction, however, are required to achieve reliable tracking suggesting that many current methods may be inappropriate in more natural settings. We discuss the implications of the study and the directions for future research that it entails. 1.
Tracking Articulated Body by Dynamic Markov Network
 PROC. IEEE INT'L CONF. ON COMPUTER VISION, NICE, FRANCE
, 2003
"... A new method for visual tracking of articulated objects is presented. Analyzing articulated motion is challenging because the dimensionality increase potentially demands tremendous increase of computation. To ease this problem, we propose an approach that analyzes subparts locally while reinforcing ..."
Abstract

Cited by 59 (9 self)
 Add to MetaCart
A new method for visual tracking of articulated objects is presented. Analyzing articulated motion is challenging because the dimensionality increase potentially demands tremendous increase of computation. To ease this problem, we propose an approach that analyzes subparts locally while reinforcing the structural constraints at the mean time. The computational model of the proposed approach is based on a dynamic Markov network, a generative model which characterizes the dynamics and the image observations of each individual subpart as well as the motion constraints among different subparts. Probabilistic variational analysis of the model reveals a mean field approximation to the posterior densities of each subparts given visual evidence, and provides a computationally efficient way for such a difficult Bayesian inference problem. In addition, we design mean field Monte Carlo (MFMC) algorithms, in which a set of low dimensional particle filters interact with each other and solve the high dimensional problem collaboratively. Extensive experiments on tracking human body parts demonstrate the effectiveness, significance and computational efficiency of the proposed method.
Articulated Soft Objects for MultiView Shape and Motion Capture
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body mode ..."
Abstract

Cited by 58 (9 self)
 Add to MetaCart
(Show Context)
We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body modeling from synchronized video sequences. Our method is both robust and generic. It could easily be applied to other shape and motion recovery problems. 1