• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

3d human pose from silhouettes by relevance vector regression. (2004)

by A Agarwal, B Triggs
Venue:In CVPR,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 199
Next 10 →

Real-time human pose recognition in parts from single depth images

by Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake - IN CVPR , 2011
"... We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler p ..."
Abstract - Cited by 568 (17 self) - Add to MetaCart
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.
(Show Context)

Citation Context

...as pairs of parallel lines, clustering appearances across frames. Shakhnarovich et al. [33] estimate upper body pose, interpolating k-NN poses matched by parameter sensitive hashing. Agarwal & Triggs =-=[1]-=- learn a regression from kernelized image silhouettes features to pose. Sigal et al. [39] use eigen-appearance template detectors for head, upper arms and lower legs proposals. Felzenszwalb & Huttenlo...

HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion

by Leonid Sigal, Alexandru O. Balan, Michael J. Black , 2006
"... While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture s ..."
Abstract - Cited by 266 (15 self) - Add to MetaCart
While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture synchronized video and ground-truth 3D motion. The resulting HUMANEVA datasets contain multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 40, 000 frames of synchronized motion capture and multi-view video (resulting in over one quarter million image frames in total) were collected at 60 Hz with an additional 37,000 time instants of pure motion capture data. A standard set of error measures is defined for evaluating both 2D and 3D pose estimation and tracking algorithms. We also describe a baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering. In the context of this baseline algorithm we explore a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters. Our experiments suggest that image observation models and motion priors play important roles in performance, and that in a multi-view laboratory environment, where initialization is available, Bayesian filtering tends to perform well. The datasets and the software are made available to the research community. This infrastructure will support the development of new articulated motion and pose estimation algorithms, will provide a baseline for the evaluation and comparison of new methods, and will help establish the current state of the art in human pose estimation and tracking.

Recovering 3D Human Pose from Monocular Images

by Ankur Agarwal, Bill Triggs
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descrip ..."
Abstract - Cited by 261 (0 self) - Add to MetaCart
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54-parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4–5 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem.
(Show Context)

Citation Context

...gression (inverse) models to eliminate the need for an explicit body model that is projected to predict image observations. A brief description of our single image regression-based scheme is given in =-=[1]-=- and the extension that resolves ambiguities using dynamics first appeared in [2]. 1.2 Overview of the Approach We represent 3D body pose by 55D vectors x including three joint angles for each of the ...

Progressive search space reduction for human pose estimation

by Vittorio Ferrari, Manuel Marín-jiménez, Andrew Zisserman - In CVPR , 2008
"... The objective of this paper is to estimate 2D human pose as a spatial configuration of body parts in TV and movie video shots. Such video material is uncontrolled and extremely challenging. We propose an approach that progressively reduces the search space for body parts, to greatly improve the chan ..."
Abstract - Cited by 226 (30 self) - Add to MetaCart
The objective of this paper is to estimate 2D human pose as a spatial configuration of body parts in TV and movie video shots. Such video material is uncontrolled and extremely challenging. We propose an approach that progressively reduces the search space for body parts, to greatly improve the chances that pose estimation will succeed. This involves two contributions: (i) a generic detector using a weak model of pose to substantially reduce the full pose search space; and (ii) employing ‘grabcut ’ initialized on detected regions proposed by the weak model, to further prune the search space. Moreover, we also propose (iii) an integrated spatiotemporal model covering multiple frames to refine pose estimates from individual frames, with inference using belief propagation. The method is fully automatic and self-initializing, and explains the spatio-temporal volume covered by a person moving in a shot, by soft-labeling every pixel as belonging to a particular body part or to the background. We demonstrate upper-body pose estimation by an extensive evaluation over 70000 frames from four episodes of the TV series Buffy the vampire slayer, and present an application to fullbody action recognition on the Weizmann dataset. 1.
(Show Context)

Citation Context

...vident, with applications ranging from video understanding and search through to surveillance. Indeed 2D human segmentation is often the first step in determining 3D human pose from individual frames =-=[2]-=-. We illustrate the use of the extracted poses with an application to action recognition on the Weizmann dataset. 1.1. Approach overview We overview the method here for the upper-body case, where ther...

The pyramid match kernel: Efficient learning with sets of features

by Kristen Grauman, Trevor Darrell, Pietro Perona - Journal of Machine Learning Research , 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract - Cited by 136 (10 self) - Add to MetaCart
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
(Show Context)

Citation Context

...ody pose. Many vision researchers have addressed the difficult problem of articulated pose estimation; recent approaches have attempted to directly learn the relationship between images and 3-D pose (=-=Agarwal and Triggs, 2004-=-; Shakhnarovich et al., 2003; Grauman et al., 2003). Like these techniques, we learn a function that maps observable image features to 3-D poses. However, whereas ordered, fixed-length feature sets ar...

Priors for people tracking from small training sets

by Raquel Urtasun, David J. Fleet - In ICCV , 2005
"... We advocate the use of Scaled Gaussian Process Latent Variable Models (SGPLVM) to learn prior models of 3D human pose for 3D people tracking. The SGPLVM simultaneously optimizes a low-dimensional embedding of the high-dimensional pose data and a density function that both gives higher probability to ..."
Abstract - Cited by 122 (23 self) - Add to MetaCart
We advocate the use of Scaled Gaussian Process Latent Variable Models (SGPLVM) to learn prior models of 3D human pose for 3D people tracking. The SGPLVM simultaneously optimizes a low-dimensional embedding of the high-dimensional pose data and a density function that both gives higher probability to points close to training data and provides a nonlinear probabilistic mapping from the lowdimensional latent space to the full-dimensional pose space. The SGPLVM is a natural choice when only small amounts of training data are available. We demonstrate our approach with two distinct motions, golfing and walking. We show that the SGPLVM sufficiently constrains the problem such that tracking can be accomplished with straighforward deterministic optimization. 1.
(Show Context)

Citation Context

...e tracked, namely, the ankles, knees, chest, head, left shoulder, elbow and hand. This entire process only required a few mouse clicks and could easily be automated using posture detection techniques =-=[1, 5]-=-. The initial states for the dynamical model, � and � , were chosen to be those in the training database that best projected onto the first two frames. Figure 6 shows the estimated 3D model projected ...

Discriminative Density Propagation for 3D Human Motion Estimation

by Cristian Sminchisescu, Atul Kanaujia, Zhiguo Li, Dimitris Metaxas - In CVPR , 2005
"... We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predi ..."
Abstract - Cited by 114 (16 self) - Add to MetaCart
We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capture-based test sequences and compare against nearest-neighbor and regression methods.
(Show Context)

Citation Context

...t indirect with respect to the task, that requires conditional state estimation and not conditional observation modeling. These arguments motivate the complementary study of discriminative algorithms =-=[7, 17, 20, 18, 2]-=- that model and predict the state conditional directly in order to simplify inference. Prediction however involves missing (state) data, unlike learning that is supervised. But learning is also diffic...

Observing HumanObject Interactions: Using Spatial and Functional Compatibility for Recognition.

by A Gupta, A Kembhavi, L S Davis - TPAMI, , 2009
"... ..."
Abstract - Cited by 112 (6 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ntation. Once we have a possible human segmentation, we extract shape context features (5 radial bins and 12 orientation bins) from the silhouette of the human. We then cluster shape context features =-=[1]-=- from the training database to build a dictionary of “shape context words.” A detected human in an image is then characterized by the histogram of shape context words. The number of words/clusters det...

Single view human action recognition using key pose matching and viterbi path searching

by Fengjun Lv, Ramakant Nevatia - In Computer Vision and Pattern Recognition (CVPR , 2007
"... 3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective project ..."
Abstract - Cited by 95 (7 self) - Add to MetaCart
3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective projection. We present an approach that does not explicitly infer 3D pose at each frame. Instead, from existing action models we search for a series of actions that best match the input sequence. In our approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. Given the input, silhouette matching between the input frames and the key poses is performed first using an enhanced Pyramid Match Kernel algorithm. The best matched sequence of actions is then tracked using the Viterbi algorithm. We demonstrate this approach on a challenging video sets consisting of 15 complex action classes. 1.
(Show Context)

Citation Context

...w is an important task for many applications such as video surveillance, human computer interaction and video content retrieval. Recently, many research efforts have focused on recovering human poses =-=[1, 6, 12]-=-, which is considered as a necessary step for view-invariant human action recognition. However, 3D pose reconstruction from a single viewpoint is a well known difficult problem in itself because of th...

Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts

by Matthieu Bray, Pushmeet Kohli, Philip H. S. Torr - In ECCV , 2006
"... Abstract. We present a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other related state of the art techniques which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together ..."
Abstract - Cited by 79 (6 self) - Add to MetaCart
Abstract. We present a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other related state of the art techniques which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Normally, when optimizing for pose, it is traditional to use some fixed set of features, e.g. edges or chamfer maps. In contrast, our novel approach consists of optimizing a cost function based on a Markov Random Field (MRF). This has the advantage that we can use all the information in the image: edges, background and foreground appearances, as well as the prior information on the shape and pose of the subject and combine them in a Bayesian framework. Previously, optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Note that although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any specified rigid, deformable or articulated object. 1
(Show Context)

Citation Context

... [1–6]. In the last few years, several techniques have been proposed for tackling the pose inference problem, some of which have obtained decent results. In particular, the work of Agarwal and Triggs =-=[1]-=- using relevance vector machines and that of Shakhnarovich et al. [3] based on parametric sensitive hashing induced a lot interest and have been shown to give good results. ∗ This work was supported b...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University