Results 1 - 10
of
22
Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction
"... Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representation ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for taskinvariance – resistance to background clutter and within the same human pose class variance. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images. 1.
Monocular 3D Pose Estimation and Tracking by Detection
"... Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by realworld scenarios, such ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by realworld scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and viewpoint of the person from single frames. The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available 2D image evidence into robust estimates of 2D limb positions over short image sequences ( = tracklets). The third and final stage uses those tracklet-based estimates as robust image observations to reliably recover 3D pose. We demonstrate state-of-the-art performance on the HumanEva II benchmark, and also show the applicability of our approach to articulated 3D tracking in realistic street conditions. 1.
Relevant feature selection for human pose estimation and localization in cluttered images
- In Proc. ECCV
, 2008
"... Abstract. We address the problem of estimating human body pose from a single image with cluttered background. We train multiple local linear regressors for estimating the 3D pose from a feature vector of gradient orientation histograms. Each linear regressor is capable of selecting relevant componen ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. We address the problem of estimating human body pose from a single image with cluttered background. We train multiple local linear regressors for estimating the 3D pose from a feature vector of gradient orientation histograms. Each linear regressor is capable of selecting relevant components of the feature vector depending on pose by training it on a pose cluster which is a subset of the training samples with similar pose. For discriminating the pose clusters, we use kernel Support Vector Machines (SVM) with pose-dependent feature selection. We achieve feature selection for kernel SVMs by estimating scale parameters of RBF kernel through minimization of the radius/margin bound, which is an upper bound of the expected generalization error, with efficient gradient descent. Human detection is also possible with these SVMs. Quantitative experiments show the effectiveness of pose-dependent feature selection to both human detection and pose estimation. 1
Evaluating Recognition-Based Motion Capture on HumanEva II Test Data
, 2008
"... The advent of the HumanEva standardized motion capture data sets has enabled quantitative evaluation of motion capture algorithms on comparable terms. This paper measures the performance of an existing monocular recognition-based pose recovery algorithm on select HumanEva data, including all the Hum ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The advent of the HumanEva standardized motion capture data sets has enabled quantitative evaluation of motion capture algorithms on comparable terms. This paper measures the performance of an existing monocular recognition-based pose recovery algorithm on select HumanEva data, including all the HumanEva II clips. The method uses a physically-motivated Markov process to connect adajacent frames and achieve a 3D relative mean error of 8.9 cm per joint, better than recently reported results. It further investigates factors contributing to the error, and finds that research into better pose retrieval methods offers promise for improvement of this technique and those related to it. Finally, it investigates the effects of local search optimization with the same recognition-based algorithm and finds no significant deterioration in the results, indicating that processing speed can be largely independent of the size of the recognition library for this approach. 1
Recognizing Activities with Multiple Cues
"... Abstract. In this paper, we introduce a first-order probabilistic model that combines multiple cues to classify human activities from video data accurately and robustly. Our system works in a realistic office setting with background clutter, natural illumination, different people, and partial occlus ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. In this paper, we introduce a first-order probabilistic model that combines multiple cues to classify human activities from video data accurately and robustly. Our system works in a realistic office setting with background clutter, natural illumination, different people, and partial occlusion. The model we present is compact, requires only fifteen sentences of first-order logic grouped as a Dynamic Markov Logic Network (DMLNs) to implement the probabilistic model and leverages existing state-of-the-art work in pose detection and object recognition. 1
Exploiting within-clique factorizations in junctiontree algorithms
- In AISTATS
, 2010
"... It is well-known that exact inference in tree-structured graphical models can be accomplished efficiently by message-passing operations following a simple protocol making use of the distributive law [Aji and McEliece, 2000,Kschischang et al., 2001], and that exact inference in arbitrary graphical mo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
It is well-known that exact inference in tree-structured graphical models can be accomplished efficiently by message-passing operations following a simple protocol making use of the distributive law [Aji and McEliece, 2000,Kschischang et al., 2001], and that exact inference in arbitrary graphical models can be solved by the Junction-Tree Algorithm; its efficiency is determined by the size
M.J.: Contour people: A parameterized model of 2D articulated human shape CVPR
, 2010
"... We define a new “contour person ” model of the human body that has the expressive power of a detailed 3D model and the computational benefits of a simple 2D part-based model. The contour person (CP) model is learned from a 3D SCAPE model of the human body that captures natural shape and pose variati ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We define a new “contour person ” model of the human body that has the expressive power of a detailed 3D model and the computational benefits of a simple 2D part-based model. The contour person (CP) model is learned from a 3D SCAPE model of the human body that captures natural shape and pose variations; the projected contours of this model, along with their segmentation into parts forms the training set. The CP model factors deformations of the body into three components: shape variation, viewpoint change and part rotation. This latter model also incorporates a learned non-rigid deformation model. The result is a 2D articulated model that is compact to represent, simple to compute with and more expressive than previous models. We demonstrate the value of such a model in 2D pose estimation and segmentation. Given an initial pose from a standard pictorial-structures method, we refine the pose and shape using an objective function that segments the scene into foreground and background regions. The result is a parametric, human-specific, image segmentation. 1.
On the sustained tracking of human motion
- in 8th IEEE International Conference on Automatic Face and Gesture Recognition
, 2008
"... In this paper, we propose an algorithm for sustained tracking of humans, where we combine frame-to-frame articulated motion estimation with a per-frame body detection algorithm. The proposed approach can automatically recover from tracking error and drift. The frame-to-frame motion estimation algori ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we propose an algorithm for sustained tracking of humans, where we combine frame-to-frame articulated motion estimation with a per-frame body detection algorithm. The proposed approach can automatically recover from tracking error and drift. The frame-to-frame motion estimation algorithm replaces traditional dynamic models within a filtering framework. Stable and accurate per-frame motion is estimated via an image-gradient based algorithm that solves a linear constrained least squares system. The per-frame detector learns appearance of different body parts and ‘sketches ’ expected gradient maps to detect discriminant pose configurations in images. The resulting online algorithm is computationally efficient and has been widely tested on a large dataset of sequences of drivers in vehicles. It shows stability and sustained accuracy over thousands of frames. 1.
Exact Inference in Graphical Models: is There More to it?
"... In general, the Junction-Tree Algorithm is ‘the solution ’ to exact inference in graphical models. It has running time O(AN C) where ◮ A is the number of nodes ◮ N is the domain size for each node ◮ C is the size of the maximal cliques in the triangulated graph nodes maximal cliques factors Factor G ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In general, the Junction-Tree Algorithm is ‘the solution ’ to exact inference in graphical models. It has running time O(AN C) where ◮ A is the number of nodes ◮ N is the domain size for each node ◮ C is the size of the maximal cliques in the triangulated graph nodes maximal cliques factors Factor Graphs O(AN C) can be pretty bad, since triangulating the graph often increases its maximal clique size. Instead, people often resort to inference in loopy factor-graphs, whose running time is O(AN F), where F is the size of the factors. However, this is generally inexact. nodes maximal cliques factors Some Examples ◮ models for pose reconstruction, e.g. [Sigal and Black, 2006] ◮ pairwise factors allow for some ‘elasticity’ of the joints ◮ maximal cliques of size threeSome Examples ◮ models with loops, e.g. [Coughlan and Ferreira, 2002] ◮ after triangulation, maximal cliques have size three ◮ loopy belief-propagation can be shown to converge to the correct solutionSome Examples ◮ skip-chain CRFs, e.g. [Sutton and McCallum, 2006,
Seeing 3D Objects in a Single 2D Image
, 2009
"... A general framework simultaneously addressing pose estimation, 2D segmentation, object recognition, and 3D reconstruction from a single image is introduced in this paper. The proposed approach partitions 3D space into voxels and estimates the voxel states that maximize a likelihood integrating two c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A general framework simultaneously addressing pose estimation, 2D segmentation, object recognition, and 3D reconstruction from a single image is introduced in this paper. The proposed approach partitions 3D space into voxels and estimates the voxel states that maximize a likelihood integrating two components: the object fidelity, that is, the probability that an object occupies the given voxels, here encoded as a 3D shape prior learned from 3D samples of objects in a class; and the image fidelity, meaning the probability that the given voxels would produce the input image when properly projected to the image plane. We derive a loop-less graphical model for this likelihood and propose a computationally efficient optimization algorithm that is guaranteed to produce the global likelihood maximum. Furthermore, we derive a multi-resolution implementation of this algorithm that permits to trade reconstruction and estimation accuracy for computation. The presentation of the proposed framework is complemented with experiments on real data demonstrating the accuracy of the proposed approach. 1.

