Results 1 - 10
of
80
3d people tracking with gaussian process dynamical models
- In CVPR
, 2006
"... We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. W ..."
Abstract
-
Cited by 64 (15 self)
- Add to MetaCart
We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. With Bayesian model averaging a GPDM can be learned from relatively small amounts of data, and it generalizes gracefully to motions outside the training set. Here we modify the GPDM to permit learning from motions with significant stylistic variation. The resulting priors are effective for tracking a range of human walking styles, despite weak and noisy image measurements and significant occlusions. 1.
T.: Sparse probabilistic regression for activity-independent human pose inference
- In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2008
"... Discriminative approaches to human pose inference involve mapping visual observations to articulated body configurations. Current probabilistic approaches to learn this mapping have been limited in their ability to handle domains with a large number of activities that require very large training set ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Discriminative approaches to human pose inference involve mapping visual observations to articulated body configurations. Current probabilistic approaches to learn this mapping have been limited in their ability to handle domains with a large number of activities that require very large training sets. We propose an online probabilistic regression scheme for efficient inference of complex, highdimensional, and multimodal mappings. Our technique is based on a local mixture of Gaussian Processes, where locality is defined based on both appearance and pose, and where the mapping hyperparameters can vary across local neighborhoods to better adapt to specific regions in the pose space. The mixture components are defined online in very small neighborhoods, so learning and inference is extremely efficient. When the mapping is one-to-one, we derive a bound on the approximation error of local regression (vs. global regression) for monotonically decreasing covariance functions. Our method can determine when training examples are redundant given the rest of the database, and use this criteria for pruning. We report results on synthetic (Poser) and real (Humaneva) pose databases, obtaining fast and accurate pose estimates using training set sizes up to 105. 1.
Physics-Based Person Tracking Using Simplified Lower-Body Dynamics
"... We introduce a physics-based model for 3D person tracking. Based on a biomechanical characterization of lower-body dynamics, the model captures important physical properties of bipedal locomotion such as balance and ground contact, generalizes naturally to variations in style due to changes in speed ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We introduce a physics-based model for 3D person tracking. Based on a biomechanical characterization of lower-body dynamics, the model captures important physical properties of bipedal locomotion such as balance and ground contact, generalizes naturally to variations in style due to changes in speed, step-length, and mass, and avoids common problems such as footskate that arise with existing trackers. The model dynamics comprises a two degree-offreedom representation of human locomotion with inelastic ground contact. A stochastic controller generates impulsive forces during the toe-off stage of walking and spring-like forces between the legs. A higher-dimensional kinematic observation model is then conditioned on the underlying dynamics. We use the model for tracking walking people from video, including examples with turning, occlusion, and varying gait. 1.
Gaussian Process Latent Variable Models for Human Pose Estimation
"... We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GP-LVM [1]. We learn a dynamical model over the latent space whic ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GP-LVM [1]. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and requires no manual initialization. 1.
Combined discriminative and generative articulated pose and non-rigid shape estimation
"... Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initia ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments. 1
A Nonlinear Discriminative Approach to AAM Fitting
"... The Active Appearance Model (AAM) is a powerful generative method for modeling and registering deformable visual objects. Most methods for AAM fitting utilize a linear parameter update model in an iterative framework. Despite its popularity, the scope of this approach is severely restricted, both in ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
The Active Appearance Model (AAM) is a powerful generative method for modeling and registering deformable visual objects. Most methods for AAM fitting utilize a linear parameter update model in an iterative framework. Despite its popularity, the scope of this approach is severely restricted, both in fitting accuracy and capture range, due to the simplicity of the linear update models used. In this paper, we present an new AAM fitting formulation, which utilizes a nonlinear update model. To motivate our approach, we compare its performance against two popular fitting methods on two publicly available face databases, in which this formulation boasts significant performance improvements. 1.
Scaled motion dynamics for markerless motion capture
- in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2007
"... This work proposes a way to use a-priori knowledge on motion dynamics for markerless human motion capture (MoCap). Specifically, we match tracked motion patterns to training patterns in order to predict states in successive frames. Thereby, modeling the motion by means of twists allows for a proper ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
This work proposes a way to use a-priori knowledge on motion dynamics for markerless human motion capture (MoCap). Specifically, we match tracked motion patterns to training patterns in order to predict states in successive frames. Thereby, modeling the motion by means of twists allows for a proper scaling of the prior. Consequently, there is no need for training data of different frame rates or velocities. Moreover, the method allows to combine very different motion patterns. Experiments in indoor and outdoor scenarios demonstrate the continuous tracking of familiar motion patterns in case of artificial frame drops or in situations insufficiently constrained by the image data. 1.
Speech recognition techniques for a sign language recognition system
- In Interspeech 2007 - Eurospeech
, 2007
"... One of the most significant differences between automatic sign language recognition (ASLR) and automatic speech recognition (ASR) is due to the computer vision problems, whereas the corresponding problems in speech signal processing have been solved due to intensive research in the last 30 years. We ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
One of the most significant differences between automatic sign language recognition (ASLR) and automatic speech recognition (ASR) is due to the computer vision problems, whereas the corresponding problems in speech signal processing have been solved due to intensive research in the last 30 years. We present our approach where we start from a large vocabulary speech recognition system to profit from the insights that have been obtained in ASR research. The system developed is able to recognize sentences of continuous sign language independent of the speaker. The features used are obtained from standard video cameras without any special data acquisition devices. In particular, we focus on feature and model combination techniques applied in ASR, and the usage of pronunciation and language models (LM) in sign language. These techniques can be used for all kind of sign language recognition systems, and for many video analysis problems where the temporal context is important, e.g. for action or gesture recognition. On a publicly available benchmark database consisting of 201 sentences and 3 signers, we can achieve a 17 % WER.
Regression-based Hand Pose Estimation from Multiple Cameras
, 2006
"... The RVM-based learning method for whole body pose estimation proposed by Agarwal and Triggs is adapted to hand pose recovery. To help overcome the difficulties presented by the greater degree of self-occlusion and the wider range of poses exhibited in hand imagery, the adaptation proposes a method f ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The RVM-based learning method for whole body pose estimation proposed by Agarwal and Triggs is adapted to hand pose recovery. To help overcome the difficulties presented by the greater degree of self-occlusion and the wider range of poses exhibited in hand imagery, the adaptation proposes a method for combining multiple views. Comparisons of performance using single versus multiple views are reported for both synthesized and real imagery, and the effects of the number of image measurements and the number of training samples on performance are explored.
Twin Gaussian Processes for Structured Prediction
, 2010
"... ... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examp ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.

