Results 1 - 10
of
37
Estimating Human Body Configurations using Shape Context Matching
, 2002
"... The problem we consider in this paper is to take a single two-dimensional image containing a human body, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in ..."
Abstract
-
Cited by 104 (9 self)
- Add to MetaCart
The problem we consider in this paper is to take a single two-dimensional image containing a human body, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labelled for future use. The test shape is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the joint locations, the 3D body configuration and pose are then estimated.
Finding and Tracking People from the Bottom Up
, 2003
"... We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of ..."
Abstract
-
Cited by 87 (4 self)
- Add to MetaCart
We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of the appearance of the body of each individual by clustering candidate body segments, and then uses this model to find all individuals in each frame. Unusually, the tracker does not rely on a model of human dynamics to identify possible instances of people; such models are unreliable, because human motion is fast and large accelerations are common. We show our tracking algorithm can be interpreted as a loopy inference procedure on an underlying Bayes net. Experiments on video of real scenes demonstrate that this tracker can (a) count distinct individuals; (b)identify and track them; (c) recover when it loses track, for example, if individuals are occluded or briefly leave the view; (d) identify the configuration of the body largely correctly; and (e) is not dependent on particular models of human motion.
Human Body Model Acquisition and Tracking Using Voxel Data
, 2003
"... We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouett ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouettes. These reconstructions are then used as input to the model acquisition and tracking algorithms.
PAMPAS: Real-Valued Graphical Models for Computer Vision
, 2003
"... Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many comm ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many common computer vision problems naturally map onto the graphical model framework; the representation is a graph where each node contains a portion of the state-space and there is an edge between two nodes only if they are not independent conditional on the other nodes in the graph. When this graph is sparsely connected, belief propagation algorithms can turn an exponential inference computation into one which is linear in the size of the graph. However belief propagation is only applicable when the variables in the nodes are discrete-valued or jointly represented by a single multivariate Gaussian distribution, and this rules out many computer vision applications.
Unsupervised learning of human motion
- IEEE Trans. PAMI
, 2003
"... Abstract—An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful “foreground ” features as well as feat ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
Abstract—An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful “foreground ” features as well as features that arise from irrelevant background clutter—the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences. Index Terms—Unsupervised learning, human motion, decomposable triangulated graph, probabilistic models, greedy search, EM algorithm, mixture models. 1
Tracking people by learning their appearance
- IEEE Trans. Pattern Anal. Mach. Intell
"... Abstract—An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard beca ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Abstract—An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard because people can move fast and unpredictably, can appear in a variety of poses and clothes, and are often surrounded by limb-like clutter. We develop a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame (“tracking by model-building and detection”). We develop two algorithms that build models; one bottom-up approach groups together candidate body parts found throughout a sequence. We also describe a top-down approach that automatically builds people-models by detecting convenient key poses within a sequence. We finally show that building a discriminative model of appearance is quite helpful since it exploits structure in a background (without background-subtraction). We demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity, a feature-length film (“Run Lola Run”), and legacy sports footage (from the 2002 World Series and 1998 Winter Olympics). Experiments suggest that our system 1) can count distinct individuals, 2) can identify and track them, 3) can recover when it loses track, for example, if individuals are occluded or briefly leave the view, 4) can identify body configuration accurately, and 5) is not dependent on particular models of human motion. Index Terms—People tracking, motion capture, surveillance. 1
Polynomial-Time Metrics for Attributed Trees
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... We address the problem of comparing attributed trees and propose four novel distance measures centered around the notion of a maximal similarity common subtree. The proposed measures are general and defined on trees endowed with either symbolic or continuous-valued attributes, and can be equally app ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We address the problem of comparing attributed trees and propose four novel distance measures centered around the notion of a maximal similarity common subtree. The proposed measures are general and defined on trees endowed with either symbolic or continuous-valued attributes, and can be equally applied to ordered and unordered, rooted and unrooted trees. We prove that our measures satisfy the metric constraints and provide a polynomial-time algorithm to compute them. This is a remarkable and attractive property, since the computation of tra-ditional edit-distance-based metrics is NP-complete, except for ordered structures. We experimentally validate the usefulness of our metrics on shape matching tasks, and compare them with edit-distance measures. ∗ Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence 1
Recovering 3D Human Body Configurations Using Shape Contexts
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently—tracking just becomes repeated recognition. We present results on a variety of data sets.
Evaluating Video-Based Motion Capture
- In Proceedings of Computer Animation
, 2002
"... Motion capture can be an effective method of creating realistic human motion for animation. Unfortunately, the quality demands for animation place challenging demands on a capture system. To date, capture solutions that meet these demands have required specialized hardware that is invasive and expen ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Motion capture can be an effective method of creating realistic human motion for animation. Unfortunately, the quality demands for animation place challenging demands on a capture system. To date, capture solutions that meet these demands have required specialized hardware that is invasive and expensive. Computer vision could make animation data much easier to obtain. Unfortunately, current techniques fall short of the demands of animation applications. In this paper, we will explore why the demands of animation lead to a particularly difficult challenge for capture techniques. We present a constraint-based methodology for reconstructing the 3D motion given image observations, and use this as a tool for understanding the problem. Synthetic experiments confirm that these situations would arise in practice. The experiments show how even simple visual tracking information can be used to create human motion but even with perfect tracking, incorrect reconstructions are not only possible but inevitable.
Silhouette Lookup for Automatic Pose Tracking
, 2004
"... Computers should be able to detect and track the articulated 3-D pose of a human being moving through a video sequence. Current tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a sequence. This paper introduces a simple yet effe ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Computers should be able to detect and track the articulated 3-D pose of a human being moving through a video sequence. Current tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a sequence. This paper introduces a simple yet effective algorithm for tracking articulated pose, based upon looking up observed silhouettes in a collection of known poses. The new algorithm runs quickly, can initialize itself without human intervention, and can automatically recover from critical tracking errors made while tracking previous frames in a video sequence.

