Results 1 - 10
of
193
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 218 (0 self)
- Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
3d people tracking with gaussian process dynamical models
- In CVPR
, 2006
"... We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. W ..."
Abstract
-
Cited by 201 (18 self)
- Add to MetaCart
(Show Context)
We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors for 3D people tracking. A GPDM provides a lowdimensional embedding of human motion data, with a density function that gives higher probability to poses and motions close to the training data. With Bayesian model averaging a GPDM can be learned from relatively small amounts of data, and it generalizes gracefully to motions outside the training set. Here we modify the GPDM to permit learning from motions with significant stylistic variation. The resulting priors are effective for tracking a range of human walking styles, despite weak and noisy image measurements and significant occlusions. 1.
Gaussian process dynamical models for human motion
- IEEE TRANS. PATTERN ANAL. MACHINE INTELL
, 2008
"... We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, ..."
Abstract
-
Cited by 158 (5 self)
- Add to MetaCart
(Show Context)
We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.
Single view human action recognition using key pose matching and viterbi path searching
- In Computer Vision and Pattern Recognition (CVPR
, 2007
"... 3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective project ..."
Abstract
-
Cited by 95 (7 self)
- Add to MetaCart
(Show Context)
3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective projection. We present an approach that does not explicitly infer 3D pose at each frame. Instead, from existing action models we search for a series of actions that best match the input sequence. In our approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. Given the input, silhouette matching between the input frames and the key poses is performed first using an enhanced Pyramid Match Kernel algorithm. The best matched sequence of actions is then tracked using the Viterbi algorithm. We demonstrate this approach on a challenging video sets consisting of 15 complex action classes. 1.
Twin Gaussian Processes for Structured Prediction
, 2010
"... ... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examp ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
... generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.
Monocular Human Motion Capture with a Mixture of Regressors
- IEEE Workshop on Vision for Human-Computer Interaction
, 2005
"... We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible p ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
(Show Context)
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multi-valuedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamicalmodel based tracker that automatically detects tracking failures and re-initializes in a probabilistically correct manner. The system is trained on conventional motion capture data, using both the corresponding real human silhouettes and silhouettes synthesized artificially from several different models for improved robustness to inter-person variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic re-initializations. 1.
Learning Joint Top-down and Bottom-up Processes for 3D Visual Inference
- in IEEE International Conference on Computer Vision and Pattern Recognition
, 2006
"... We present an algorithm for jointly learning a consis-tent bidirectional generative-recognition model that com-bines top-down and bottom-up processing for monocular 3d human motion reconstruction. Learning progresses in alter-native stages of self-training that optimize the probability of the image ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
(Show Context)
We present an algorithm for jointly learning a consis-tent bidirectional generative-recognition model that com-bines top-down and bottom-up processing for monocular 3d human motion reconstruction. Learning progresses in alter-native stages of self-training that optimize the probability of the image evidence: the recognition model is tunned us-ing samples from the generative model and the generative model is optimized to produce inferences close to the ones predicted by the current recognition model. At equilibrium, the two models are consistent. During on-line inference, we scan the image at multiple locations and predict 3d hu-man poses using the recognition model. But this implic-itly includes one-shot generative consistency feedback. The framework provides a uniform treatment of human detec-tion, 3d initialization and 3d recovery from transient fail-ure. Our experimental results show that this procedure is promising for the automatic reconstruction of human mo-tion in more natural scene settings with background clutter and occlusion. 1.
Fast human pose estimation using appearance and motion via multi-dimensional boosting regression
- in CVPR, 2007
"... We address the problem of estimating human pose in video sequences, where rough location has been deter-mined. We exploit both appearance and motion information by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the hum ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
(Show Context)
We address the problem of estimating human pose in video sequences, where rough location has been deter-mined. We exploit both appearance and motion information by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the human body using boosting techniques. Our algorithm can be viewed as a fast initialization step for human body trackers, or as a tracker itself. We extend gra-dient boosting techniques to learn a multi-dimensional map from (rotated and scaled) Haar features to the entire set of joint angles representing the full body pose. We test our approach by learning a map from image patches to body joint angles from synchronized video and motion capture walking data. We show how our technique enables learning an efficient real-time pose estimator, validated on publicly available datasets. 1.
Topologically-constrained latent variable models
- In ICML ’08: Proceedings of the 25th international conference on Machine learning
, 2008
"... ..."
(Show Context)
Learning and matching of dynamic shape manifolds for human action recognition
- TIP
, 2007
"... Abstract—In this paper, we learn explicit representations for dy-namic shape manifolds of moving humans for the task of action recognition. We exploit locality preserving projections (LPP) for dimensionality reduction, leading to a low-dimensional embedding of human movements. Given a sequence of mo ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Abstract—In this paper, we learn explicit representations for dy-namic shape manifolds of moving humans for the task of action recognition. We exploit locality preserving projections (LPP) for dimensionality reduction, leading to a low-dimensional embedding of human movements. Given a sequence of moving silhouettes as-sociated to an action video, by LPP, we project them into a low-di-mensional space to characterize the spatiotemporal property of the action, as well as to preserve much of the geometric structure. To match the embedded action trajectories, the median Hausdorff dis-tance or normalized spatiotemporal correlation is used for simi-larity measures. Action classification is then achieved in a nearest-neighbor framework. To evaluate the proposed method, extensive experiments have been carried out on a recent dataset including ten actions performed by nine different subjects. The experimental results show that the proposed method is able to not only recog-nize human actions effectively, but also considerably tolerate some challenging conditions, e.g., partial occlusion, low-quality videos, changes in viewpoints, scales, and clothes; within-class variations caused by different subjects with different physical build; styles of motion; etc. Index Terms—Action recognition, dimensionality reduction, human motion analysis, locality preserving projections (LPP). I.