Results 1  10
of
134
Histograms of Oriented Gradients for Human Detection
 In CVPR
, 2005
"... We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly out ..."
Abstract

Cited by 1670 (6 self)
 Add to MetaCart
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that finescale gradients, fine orientation binning, relatively coarse spatial binning, and highquality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives nearperfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds. 1
Pictorial Structures for Object Recognition
 IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract

Cited by 524 (15 self)
 Add to MetaCart
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by springlike connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Recovering human body configurations: Combining segmentation and recognition
 In CVPR
, 2004
"... localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this p ..."
Abstract

Cited by 171 (8 self)
 Add to MetaCart
localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this problem in a general setting. The dataset we use is a collection of sports news photographs of baseball players, varying dramatically in pose and clothing. The approach that we take is to use segmentation to guide our recognition algorithm to salient bits of the image. We use this segmentation approach to build limb and torso detectors, the outputs of which are assembled into human figures. We present quantitative results on torso localization, in addition to shortlisted full body configurations. 1.
Strike a pose: Tracking people by finding stylized poses
 In CVPR
, 2005
"... We develop an algorithm for finding and kinematically tracking multiple people in long sequences. Our basic assumption is that people tend to take on certain canonical poses, even when performing unusual activities like throwing a baseball or figure skating. We build a person detector that quite acc ..."
Abstract

Cited by 116 (13 self)
 Add to MetaCart
We develop an algorithm for finding and kinematically tracking multiple people in long sequences. Our basic assumption is that people tend to take on certain canonical poses, even when performing unusual activities like throwing a baseball or figure skating. We build a person detector that quite accurately detects and localizes limbs of people in lateral walking poses. We use the estimated limbs from a detection to build a discriminative appearance model; we assume the features that discriminate a figure in one frame will discriminate the figure in other frames. We then use the models as limb detectors in a pictorial structure framework, detecting figures in unrestricted poses in both previous and successive frames. We have run our tracker on hundreds of thousands of frames, and present and apply a methodology for evaluating tracking on such a large scale. We test our tracker on real sequences including a featurelength film, an hour of footage from a public park, and various sports sequences. We find that we can quite accurately automatically find and track multiple people interacting with each other while performing fast and unusual motions. 1.
Finding and Tracking People from the Bottom Up
, 2003
"... We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of ..."
Abstract

Cited by 116 (6 self)
 Add to MetaCart
We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of the appearance of the body of each individual by clustering candidate body segments, and then uses this model to find all individuals in each frame. Unusually, the tracker does not rely on a model of human dynamics to identify possible instances of people; such models are unreliable, because human motion is fast and large accelerations are common. We show our tracking algorithm can be interpreted as a loopy inference procedure on an underlying Bayes net. Experiments on video of real scenes demonstrate that this tracker can (a) count distinct individuals; (b)identify and track them; (c) recover when it loses track, for example, if individuals are occluded or briefly leave the view; (d) identify the configuration of the body largely correctly; and (e) is not dependent on particular models of human motion.
Fast pose estimation with parameter sensitive hashing
 In ICCV
, 2003
"... Examplebased methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and highdimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become pro ..."
Abstract

Cited by 102 (4 self)
 Add to MetaCart
Examplebased methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and highdimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends a recently developed method for localitysensitive hashing, which finds approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions; we show how to find the set of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call ParameterSensitive Hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images. 1.
Probabilistic Methods for Finding People
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2001
"... Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic prope ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity. We propose two
PAMPAS: RealValued Graphical Models for Computer Vision
, 2003
"... Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many comm ..."
Abstract

Cited by 91 (3 self)
 Add to MetaCart
Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many common computer vision problems naturally map onto the graphical model framework; the representation is a graph where each node contains a portion of the statespace and there is an edge between two nodes only if they are not independent conditional on the other nodes in the graph. When this graph is sparsely connected, belief propagation algorithms can turn an exponential inference computation into one which is linear in the size of the graph. However belief propagation is only applicable when the variables in the nodes are discretevalued or jointly represented by a single multivariate Gaussian distribution, and this rules out many computer vision applications.
Representation and Detection of Deformable Shapes
 PAMI
, 2004
"... We describe some techniques that can be used to represent and detect deformable shapes in images. The main di#culty with deformable template models is the very large or infinite number of possible nonrigid transformations of the templates. This makes the problem of finding an optimal match of a ..."
Abstract

Cited by 78 (4 self)
 Add to MetaCart
We describe some techniques that can be used to represent and detect deformable shapes in images. The main di#culty with deformable template models is the very large or infinite number of possible nonrigid transformations of the templates. This makes the problem of finding an optimal match of a deformable template to an image incredibly hard. Using a new representation for deformable shapes we show how to e#ciently find a global optimal solution to the nonrigid matching problem. The representation is based on the description of objects using triangulated polygons. Our matching algorithm can minimize a large class of energy functions, making it applicable to a wide range of problems. We present experimental results of detecting shapes in medical images and images of natural scenes. Our method does not depend on initialization and is very robust, yielding good matches even in images with high clutter.
The correlated correspondence algorithm for unsupervised registration of nonrigid surfaces
 In TRSAIL2004100, at http://robotics.stanford.edu/âˆ¼drago/cc/tr100.pdf
, 2004
"... We present an unsupervised algorithm for registering 3D surface scans of an object undergoing significant deformations. Our algorithm does not need markers, nor does it assume prior knowledge about object shape, the dynamics of its deformation, or scan alignment. The algorithm registers two meshes b ..."
Abstract

Cited by 75 (4 self)
 Add to MetaCart
We present an unsupervised algorithm for registering 3D surface scans of an object undergoing significant deformations. Our algorithm does not need markers, nor does it assume prior knowledge about object shape, the dynamics of its deformation, or scan alignment. The algorithm registers two meshes by optimizing a joint probabilistic model over all pointtopoint correspondences between them. This model enforces preservation of local mesh geometry, as well as more global constraints that capture the preservation of geodesic distance between corresponding point pairs. The algorithm applies even when one of the meshes is an incomplete range scan; thus, it can be used to automatically fill in the remaining surfaces for this partial scan, even if those surfaces were previously only seen in a different configuration. We evaluate the algorithm on several realworld datasets, where we demonstrate good results in the presence of significant movement of articulated parts and nonrigid surface deformation. Finally, we show that the output of the algorithm can be used for compelling computer graphics tasks such as interpolation between two scans of a nonrigid object and automatic recovery of articulated object models. 1