Results 1  10
of
95
Pictorial Structures for Object Recognition
 IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract

Cited by 524 (15 self)
 Add to MetaCart
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by springlike connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Realtime human pose recognition in parts from single depth images
 In In CVPR, 2011. 3
"... We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler p ..."
Abstract

Cited by 160 (10 self)
 Add to MetaCart
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler perpixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidencescored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact wholeskeleton nearest neighbor matching. 1.
Tracking people by learning their appearance
 IEEE Trans. Pattern Anal. Mach. Intell
"... Abstract—An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard beca ..."
Abstract

Cited by 74 (3 self)
 Add to MetaCart
Abstract—An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard because people can move fast and unpredictably, can appear in a variety of poses and clothes, and are often surrounded by limblike clutter. We develop a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame (“tracking by modelbuilding and detection”). We develop two algorithms that build models; one bottomup approach groups together candidate body parts found throughout a sequence. We also describe a topdown approach that automatically builds peoplemodels by detecting convenient key poses within a sequence. We finally show that building a discriminative model of appearance is quite helpful since it exploits structure in a background (without backgroundsubtraction). We demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity, a featurelength film (“Run Lola Run”), and legacy sports footage (from the 2002 World Series and 1998 Winter Olympics). Experiments suggest that our system 1) can count distinct individuals, 2) can identify and track them, 3) can recover when it loses track, for example, if individuals are occluded or briefly leave the view, 4) can identify body configuration accurately, and 5) is not dependent on particular models of human motion. Index Terms—People tracking, motion capture, surveillance. 1
Automatic annotation of everyday movements
, 2003
"... This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) track ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The motion capture data is manually annotated using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate. 1.
Measure locally, reason globally: Occlusionsensitive articulated pose estimation
 In CVPR 2006
, 2006
"... Partbased treestructured models have been widely used for 2D articulated human poseestimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body part ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
Partbased treestructured models have been widely used for 2D articulated human poseestimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body parts fit the same image region resulting in global pose estimates that poorly explain the overall image evidence. Attempts to solve this problem have focused on the use of strong prior models that are limited to learned activities such as walking. We argue that the problem actually lies with the image observations and not with the prior. In particular, image evidence for each body part is estimated independently of other parts without regard to selfocclusion. To address this we introduce occlusionsensitive local likelihoods that approximate the global image likelihood using perpixel hidden binary variables that encode the occlusion relationships between parts. This occlusion reasoning introduces interactions between nonadjacent body parts creating loops in the underlying graphical model. We deal with this using an extension of an approximate belief propagation algorithm (PAMPAS). The algorithm recovers the realvalued 2D pose of the body in the presence of occlusions, does not require strong priors over body pose and does a quantitatively better job of explaining image evidence than previous methods. 1.
Guiding Model Search Using Segmentation
, 2005
"... ... paradigm can be used to improve the efficiency and accuracy of model search in an image. We operationalize this idea using an oversegmentation of an image into superpixels. The problem domain we explore is human body pose estimation from still images. The superpixels prove useful in two ways. F ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
... paradigm can be used to improve the efficiency and accuracy of model search in an image. We operationalize this idea using an oversegmentation of an image into superpixels. The problem domain we explore is human body pose estimation from still images. The superpixels prove useful in two ways. First, we restrict the joint positions in our human body model to lie at centers of superpixels, which reduces the size of the model search space. In addition, accurate support masks for computing features on halflimbs of the body model are obtained by using agglomerations of superpixels as halflimb segments. We present results on a challenging dataset of people in sports news images.
Distributed occlusion reasoning for tracking with nonparametric belief propagation
 In NIPS
, 2004
"... We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of fr ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. 1
Attractive people: Assembling looselimbed models using nonparametric belief propagation
 In NIPS. 2004
"... The detection and pose estimation of people in images and video is made challenging by the variability of human appearance and the high dimensionality of articulated body models. To cope with these problems we exploit rich image likelihood models and represent the 3D human body using a graphical mod ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
The detection and pose estimation of people in images and video is made challenging by the variability of human appearance and the high dimensionality of articulated body models. To cope with these problems we exploit rich image likelihood models and represent the 3D human body using a graphical model in which the relationships between the body parts are represented by conditional probability distributions. We formulate the pose estimation problem as one of probabilistic inference over a graphical model where the random variables correspond to the individual limb parameters (position and orientation). Because the limbs are described by 6dimensional vectors encoding pose in 3space, discretization is impractical and the random variables in our model must be continuousvalued. To approximate belief propagation in such a graph we exploit a recently introduced generalization of the particle filter. This framework facilitates the automatic initialization of the bodymodel from low level cues and is robust to occlusion of body parts and scene clutter. 1
Visual hand tracking using nonparametric belief propagation
 Propagation,” IEEE Workshop on Generative Model Based Vision
, 2004
"... Abstract — This paper develops probabilistic methods for visual tracking of a threedimensional geometric hand model from monocular image sequences. We consider a redundant representation in which each model component is described by its position and orientation in the world coordinate frame. A prio ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Abstract — This paper develops probabilistic methods for visual tracking of a threedimensional geometric hand model from monocular image sequences. We consider a redundant representation in which each model component is described by its position and orientation in the world coordinate frame. A prior model is then defined which enforces the kinematic constraints implied by the model’s joints. We show that this prior has a local structure, and is in fact a pairwise Markov random field. Furthermore, our redundant representation allows color and edgebased likelihood measures, such as the Chamfer distance, to be similarly decomposed in cases where there is no self–occlusion. Given this graphical model of hand kinematics, we may track the hand’s motion using the recently proposed nonparametric belief propagation (NBP) algorithm. Like particle filters, NBP approximates the posterior distribution over hand configurations as a collection of samples. However, NBP uses the graphical structure to greatly reduce the dimensionality of these distributions, providing improved robustness. Several methods are used to improve NBP’s computational efficiency, including a novel KDtree based method for fast Chamfer distance evaluation. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. I.
Predicting 3D People from 2D Pictures
 In IV Conference on Articulated Motion and Deformable Objects, AMDO
, 2006
"... Abstract. We propose a hierarchical process for inferring the 3D pose of a person from monocular images. First we infer a learned viewbased 2D body model from a single image using nonparametric belief propagation. This approach integrates information from bottomup bodypart proposal processes and ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
Abstract. We propose a hierarchical process for inferring the 3D pose of a person from monocular images. First we infer a learned viewbased 2D body model from a single image using nonparametric belief propagation. This approach integrates information from bottomup bodypart proposal processes and deals with selfocclusion to compute distributions over limb poses. Then, we exploit a learned Mixture of Experts model to infer a distribution of 3D poses conditioned on 2D poses. This approach is more general than recent work on inferring 3D pose directly from silhouettes since the 2D body model provides a richer representation that includes the 2D joint angles and the poses of limbs that may be unobserved in the silhouette. We demonstrate the method in a laboratory setting where we evaluate the accuracy of the 3D poses against ground truth data. We also estimate 3D body pose in a monocular image sequence. The resulting 3D estimates are sufficiently accurate to serve as proposals for the Bayesian inference of 3D human motion over time. 1