Results 1  10
of
58
Recovering 3D Human Pose from Monocular Images
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descrip ..."
Abstract

Cited by 182 (0 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4–5 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem.
3D Human Pose from Silhouettes by Relevance Vector Regression
 In CVPR
, 2004
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descript ..."
Abstract

Cited by 166 (6 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. For the main regression, we evaluate both regularized least squares and Relevance Vector Machine (RVM) regressors over both linear and kernel bases. The RVM’s provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6–7 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem. 1.
Incremental Learning for Robust Visual Tracking
, 2008
"... Visual tracking, in essence, deals with nonstationary image streams that change over time. While most existing algorithms are able to track objects well in controlled environments, they usually fail in the presence of significant variation of the object’s appearance or surrounding illumination. On ..."
Abstract

Cited by 138 (13 self)
 Add to MetaCart
(Show Context)
Visual tracking, in essence, deals with nonstationary image streams that change over time. While most existing algorithms are able to track objects well in controlled environments, they usually fail in the presence of significant variation of the object’s appearance or surrounding illumination. One reason for such failures is that many algorithms employ fixed appearance models of the target. Such models are trained using only appearance data available before tracking begins, which in practice limits the range of appearances that are modeled, and ignores the large volume of information (such as shape changes or specific lighting conditions) that becomes available during tracking. In this paper, we present a tracking method that incrementally learns a lowdimensional subspace representation, efficiently adapting online to changes in the appearance of the target. The model update, based on incremental algorithms for principal component analysis, includes two important features: a method for correctly updating the sample mean, and a for
ModelBased Hand Tracking Using A Hierarchical Bayesian Filter
, 2004
"... This thesis focuses on the automatic recovery of threedimensional hand motion from one or more views. A 3D geometric hand model is constructed from truncated cones, cylinders and ellipsoids and is used to generate contours, which can be compared with edge contours and skin colour in images. The han ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
This thesis focuses on the automatic recovery of threedimensional hand motion from one or more views. A 3D geometric hand model is constructed from truncated cones, cylinders and ellipsoids and is used to generate contours, which can be compared with edge contours and skin colour in images. The hand tracking problem is formulated as state estimation, where the model parameters define the internal state, which is to be estimated from image observations. In thew first
Fast multiple object tracking via a hierarchical particle filter
 In: International Conference on Computer Vision
, 2005
"... A very efficient and robust visual object tracking algorithm based on the particle filter is presented. The method characterizes the tracked objects using color and edge orientation histogram features. While the use of more features and samples can improve the robustness, the computational load requ ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
A very efficient and robust visual object tracking algorithm based on the particle filter is presented. The method characterizes the tracked objects using color and edge orientation histogram features. While the use of more features and samples can improve the robustness, the computational load required by the particle filter increases. To accelerate the algorithm while retaining robustness we adopt several enhancements in the algorithm. The first is the use of integral images [34] for efficiently computing the color features and edge orientation histograms, which allows a large amount of particles and a better description of the targets. Next, the observation likelihood based on multiple features is computed in a coarsetofine manner, which allows the computation to quickly focus on the more promising regions. Quasirandom sampling of the particles allows the filter to achieve a higher convergence rate. The resulting tracking algorithm maintains multiple hypotheses and offers robustness against clutter or short period occlusions. Experimental results demonstrate the efficiency and effectiveness of the algorithm for single and multiple object tracking. 1
Efficient meanshift tracking via a new similarity measure
 in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05
, 2005
"... The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are th ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are the Bhattacharyya coefficient or the KullbackLeibler divergence. In practice, these approaches face three difficulties. First, the spatial information of the target is lost when the color histogram is employed, which precludes the application of more elaborate motion models. Second, the classical similarity measures are not very discriminative. Third, the samplebased classical similarity measures require a calculation that is quadratic in the number of samples, making realtime performance difficult. To deal with these difficulties we propose a new, simpletocompute and more discriminative similarity measure in spatialfeature spaces. The new similarity measure allows the mean shift algorithm to track more general motion models in an integrated way. To reduce the complexity of the computation to linear order we employ the recently proposed improved fast Gauss transform. This leads to a very efficient and robust nonparametric spatialfeature tracking algorithm. The algorithm is tested on several image sequences and shown to achieve robust and reliable framerate tracking.
Monocular Human Motion Capture with a Mixture of Regressors
 IEEE Workshop on Vision for HumanComputer Interaction
, 2005
"... We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible p ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multivaluedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamicalmodel based tracker that automatically detects tracking failures and reinitializes in a probabilistically correct manner. The system is trained on conventional motion capture data, using both the corresponding real human silhouettes and silhouettes synthesized artificially from several different models for improved robustness to interperson variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic reinitializations. 1.
A Nonlinear Discriminative Approach to AAM Fitting
"... The Active Appearance Model (AAM) is a powerful generative method for modeling and registering deformable visual objects. Most methods for AAM fitting utilize a linear parameter update model in an iterative framework. Despite its popularity, the scope of this approach is severely restricted, both in ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
The Active Appearance Model (AAM) is a powerful generative method for modeling and registering deformable visual objects. Most methods for AAM fitting utilize a linear parameter update model in an iterative framework. Despite its popularity, the scope of this approach is severely restricted, both in fitting accuracy and capture range, due to the simplicity of the linear update models used. In this paper, we present an new AAM fitting formulation, which utilizes a nonlinear update model. To motivate our approach, we compare its performance against two popular fitting methods on two publicly available face databases, in which this formulation boasts significant performance improvements. 1.
Hand Pose Estimation Using Hierarchical Detection
 in Intl. Workshop on HumanComputer Interaction
, 2004
"... This paper presents an analysis of the design of classifiers for use in a hierarchical object recognition approach. In this approach, a cascade of classifiers is arranged in a tree in order to recognize multiple object classes. We are interested in the problem of recognizing multiple patterns as ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
This paper presents an analysis of the design of classifiers for use in a hierarchical object recognition approach. In this approach, a cascade of classifiers is arranged in a tree in order to recognize multiple object classes. We are interested in the problem of recognizing multiple patterns as it is closely related to the problem of locating an articulated object. Each different pattern class corresponds to the hand in a different pose, or set of poses. For this problem obtaining labelled training data of the hand in a given pose can be problematic. Given a parametric 3D model, generating training data in the form of example images is cheap, and we demonstate that it can be used to design classifiers almost as good as those trained using nonsynthetic data. We compare a variety of different templatebased classifiers and discuss their merits.