Results 1 - 10
of
45
T.: Sparse probabilistic regression for activity-independent human pose inference
- In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2008
"... Discriminative approaches to human pose inference involve mapping visual observations to articulated body configurations. Current probabilistic approaches to learn this mapping have been limited in their ability to handle domains with a large number of activities that require very large training set ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Discriminative approaches to human pose inference involve mapping visual observations to articulated body configurations. Current probabilistic approaches to learn this mapping have been limited in their ability to handle domains with a large number of activities that require very large training sets. We propose an online probabilistic regression scheme for efficient inference of complex, highdimensional, and multimodal mappings. Our technique is based on a local mixture of Gaussian Processes, where locality is defined based on both appearance and pose, and where the mapping hyperparameters can vary across local neighborhoods to better adapt to specific regions in the pose space. The mixture components are defined online in very small neighborhoods, so learning and inference is extremely efficient. When the mapping is one-to-one, we derive a bound on the approximation error of local regression (vs. global regression) for monotonically decreasing covariance functions. Our method can determine when training examples are redundant given the rest of the database, and use this criteria for pruning. We report results on synthetic (Poser) and real (Humaneva) pose databases, obtaining fast and accurate pose estimates using training set sizes up to 105. 1.
The Joint Manifold Model for Semi-supervised Multi-valued Regression
"... Many computer vision tasks may be expressed as the problem of learning a mapping between image space and a parameter space. For example, in human body pose estimation, recent research has directly modelled the mapping from image features (z) to joint angles (θ). Fitting such models requires training ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Many computer vision tasks may be expressed as the problem of learning a mapping between image space and a parameter space. For example, in human body pose estimation, recent research has directly modelled the mapping from image features (z) to joint angles (θ). Fitting such models requires training data in the form of labelled (z, θ) pairs, from which are learned the conditional densities p(θ|z). Inference is then simple: given test image features z, the conditional p(θ|z) is immediately computed. However large amounts of training data are required to fit the models, particularly in the case where the spaces are high dimensional. We show how the use of unlabelled data—samples from the marginal distributions p(z) and p(θ)—may be used to improve fitting. This is valuable because it is often significantly easier to obtain unlabelled than labelled samples. We use a Gaussian process latent variable model to learn the mapping from a shared latent low-dimensional manifold to the feature and parameter spaces. This extends existing approaches to (a) use unlabelled data, and (b) represent one-to-many mappings. Experiments on synthetic and real problems demonstrate how the use of unlabelled data improves over existing techniques. In our comparisons, we include existing approaches that are explicitly semi-supervised as well as those which implicitly make use of unlabelled examples. 1.
Conditional random people: Tracking humans with crfs and grid filters
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2006
"... We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are learned from data. We find functions that embed both state and observation into a space where similarity corresponds to L1 distance, and define an observation potential b ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are learned from data. We find functions that embed both state and observation into a space where similarity corresponds to L1 distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences. 1
Gaussian Process Latent Variable Models for Human Pose Estimation
"... We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GP-LVM [1]. We learn a dynamical model over the latent space whic ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GP-LVM [1]. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and requires no manual initialization. 1.
Impact of dynamics on subspace embedding and tracking of sequences
- In CVPR
, 2006
"... In this paper we study the role of dynamics in dimensionality reduction problems applied to sequences. We propose a new family of marginal auto-regressive (MAR) models that describe the space of all stable auto-regressive sequences, regardless of their specific dynamics. We apply the MAR class of mo ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
In this paper we study the role of dynamics in dimensionality reduction problems applied to sequences. We propose a new family of marginal auto-regressive (MAR) models that describe the space of all stable auto-regressive sequences, regardless of their specific dynamics. We apply the MAR class of models as sequence priors in probabilistic sequence subspace embedding problems. In particular, we consider a Gaussian process latent variable approach to dimensionality reduction and show that the use of MAR priors may lead to better estimates of sequence subspaces than the ones obtained by traditional non-sequential priors. We then propose a learning method for estimating nonlinear dynamic system (NDS) models that utilizes the new MAR priors. The utility of the proposed methods is demonstrated on several
Regression-based Hand Pose Estimation from Multiple Cameras
, 2006
"... The RVM-based learning method for whole body pose estimation proposed by Agarwal and Triggs is adapted to hand pose recovery. To help overcome the difficulties presented by the greater degree of self-occlusion and the wider range of poses exhibited in hand imagery, the adaptation proposes a method f ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The RVM-based learning method for whole body pose estimation proposed by Agarwal and Triggs is adapted to hand pose recovery. To help overcome the difficulties presented by the greater degree of self-occlusion and the wider range of poses exhibited in hand imagery, the adaptation proposes a method for combining multiple views. Comparisons of performance using single versus multiple views are reported for both synthesized and real imagery, and the effects of the number of image measurements and the number of training samples on performance are explored.
Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction
"... Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representation ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for taskinvariance – resistance to background clutter and within the same human pose class variance. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images. 1.
Conditional Visual Tracking in Kernel Space
- In Proc. NIPS
, 2005
"... We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efficiency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our me ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efficiency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our methodology (kBME) combines kernel PCA-based non-linear dimensionality reduction (kPCA) and Conditional Bayesian Mixture of Experts (BME) in order to learn complex multivalued predictors between observations and model hidden states. This is necessary for accurate, inverse, visual perception inferences, where several probable, distant 3D solutions exist due to noise or the uncertainty of monocular perspective projection. Low-dimensional models are appropriate because many visual processes exhibit strong non-linear correlations in both the image observations and the target, hidden state variables. The learned predictors are temporally combined within a conditional graphical model in order to allow a principled propagation of uncertainty. We study several predictors and empirically show that the proposed algorithm positively compares with techniques based on regression, Kernel Dependency Estimation (KDE) or PCA alone, and gives results competitive to those of high-dimensional mixture predictors at a fraction of their computational cost. We show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences.
D.: Searching video for complex activities with finite state models
- In: IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. We show results for a large range of queries applied to a collection of complex motion and activity. Our models of short time scale limb behaviour are built using labelled motion capture set. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by the changes of clothing. 1.
BM 3 E: Discriminative Density Propagation for Visual Tracking
- In IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations – typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continu-ous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, condi-tional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based se-quences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods.

