Results 1  10
of
167
Histograms of Oriented Gradients for Human Detection
 In CVPR
, 2005
"... We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly out ..."
Abstract

Cited by 1770 (6 self)
 Add to MetaCart
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that finescale gradients, fine orientation binning, relatively coarse spatial binning, and highquality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives nearperfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds. 1
Pictorial Structures for Object Recognition
 IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract

Cited by 539 (15 self)
 Add to MetaCart
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by springlike connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Recovering human body configurations: Combining segmentation and recognition
 In CVPR
, 2004
"... localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this p ..."
Abstract

Cited by 176 (8 self)
 Add to MetaCart
localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this problem in a general setting. The dataset we use is a collection of sports news photographs of baseball players, varying dramatically in pose and clothing. The approach that we take is to use segmentation to guide our recognition algorithm to salient bits of the image. We use this segmentation approach to build limb and torso detectors, the outputs of which are assembled into human figures. We present quantitative results on torso localization, in addition to shortlisted full body configurations. 1.
Strike a pose: Tracking people by finding stylized poses
 In CVPR
, 2005
"... We develop an algorithm for finding and kinematically tracking multiple people in long sequences. Our basic assumption is that people tend to take on certain canonical poses, even when performing unusual activities like throwing a baseball or figure skating. We build a person detector that quite acc ..."
Abstract

Cited by 121 (13 self)
 Add to MetaCart
We develop an algorithm for finding and kinematically tracking multiple people in long sequences. Our basic assumption is that people tend to take on certain canonical poses, even when performing unusual activities like throwing a baseball or figure skating. We build a person detector that quite accurately detects and localizes limbs of people in lateral walking poses. We use the estimated limbs from a detection to build a discriminative appearance model; we assume the features that discriminate a figure in one frame will discriminate the figure in other frames. We then use the models as limb detectors in a pictorial structure framework, detecting figures in unrestricted poses in both previous and successive frames. We have run our tracker on hundreds of thousands of frames, and present and apply a methodology for evaluating tracking on such a large scale. We test our tracker on real sequences including a featurelength film, an hour of footage from a public park, and various sports sequences. We find that we can quite accurately automatically find and track multiple people interacting with each other while performing fast and unusual motions. 1.
Finding and Tracking People from the Bottom Up
, 2003
"... We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of ..."
Abstract

Cited by 120 (6 self)
 Add to MetaCart
We describe a tracker that can track moving people in long sequences without manual initialization. Moving people are modeled with the assumption that, while configuration can vary quite substantially from frame to frame, appearance does not. This leads to an algorithm that firstly builds a model of the appearance of the body of each individual by clustering candidate body segments, and then uses this model to find all individuals in each frame. Unusually, the tracker does not rely on a model of human dynamics to identify possible instances of people; such models are unreliable, because human motion is fast and large accelerations are common. We show our tracking algorithm can be interpreted as a loopy inference procedure on an underlying Bayes net. Experiments on video of real scenes demonstrate that this tracker can (a) count distinct individuals; (b)identify and track them; (c) recover when it loses track, for example, if individuals are occluded or briefly leave the view; (d) identify the configuration of the body largely correctly; and (e) is not dependent on particular models of human motion.
Probabilistic Methods for Finding People
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2001
"... Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic prope ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity. We propose two
Fast pose estimation with parameter sensitive hashing
 In ICCV
, 2003
"... Examplebased methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and highdimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become pro ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
Examplebased methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and highdimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends a recently developed method for localitysensitive hashing, which finds approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions; we show how to find the set of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call ParameterSensitive Hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images. 1.
PAMPAS: RealValued Graphical Models for Computer Vision
, 2003
"... Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many comm ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many common computer vision problems naturally map onto the graphical model framework; the representation is a graph where each node contains a portion of the statespace and there is an edge between two nodes only if they are not independent conditional on the other nodes in the graph. When this graph is sparsely connected, belief propagation algorithms can turn an exponential inference computation into one which is linear in the size of the graph. However belief propagation is only applicable when the variables in the nodes are discretevalued or jointly represented by a single multivariate Gaussian distribution, and this rules out many computer vision applications.
Representation and Detection of Deformable Shapes
 PAMI
, 2004
"... We describe some techniques that can be used to represent and detect deformable shapes in images. The main di#culty with deformable template models is the very large or infinite number of possible nonrigid transformations of the templates. This makes the problem of finding an optimal match of a ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
We describe some techniques that can be used to represent and detect deformable shapes in images. The main di#culty with deformable template models is the very large or infinite number of possible nonrigid transformations of the templates. This makes the problem of finding an optimal match of a deformable template to an image incredibly hard. Using a new representation for deformable shapes we show how to e#ciently find a global optimal solution to the nonrigid matching problem. The representation is based on the description of objects using triangulated polygons. Our matching algorithm can minimize a large class of energy functions, making it applicable to a wide range of problems. We present experimental results of detecting shapes in medical images and images of natural scenes. Our method does not depend on initialization and is very robust, yielding good matches even in images with high clutter.
Recovering human body configurations using pairwise constraints between parts
 ICCV
"... The goal of this work is to recover human body configurations from static images. Without assuming a priori knowledge of scale, pose or appearance, this problem is extremely challenging and demands the use of all possible sources of information. We develop a framework which can incorporate arbitrary ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
The goal of this work is to recover human body configurations from static images. Without assuming a priori knowledge of scale, pose or appearance, this problem is extremely challenging and demands the use of all possible sources of information. We develop a framework which can incorporate arbitrary pairwise constraints between body parts, such as scale compatibility, relative position, symmetry of clothing and smooth contour connections between parts. We detect candidate body parts from bottomup using parallelism, and use various pairwise configuration constraints to assemble them together into body configurations. To find the most probable configuration, we solve an Integer Quadratic Programming problem with a standard technique using linear approximations. Approximate IQP allows us to incorporate much more information than the traditional dynamic programming and remains computationally efficient. 15 handlabeled images are used to train the lowlevel part detector and learn the pairwise constraints. We show test results on a variety of images. 1.