## Pictorial Structures for Object Recognition (2003)

### Cached

### Download Links

- [www.ai.mit.edu]
- [people.cs.uchicago.edu]
- [www.cs.unr.edu]
- [www.cse.unr.edu]
- [www.cse.unr.edu]
- [people.cs.uchicago.edu]
- [www.wisdom.weizmann.ac.il]
- [www.cs.wisc.edu]
- [www.cs.cornell.edu]
- [pages.cs.wisc.edu]
- [www.cs.cornell.edu]
- [www.cse.iitb.ac.in]
- [cs.brown.edu]
- [www.cs.cornell.edu]
- [www.cs.cornell.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IJCV |

Citations: | 561 - 16 self |

### BibTeX

@ARTICLE{Felzenszwalb03pictorialstructures,

author = {Pedro F. Felzenszwalb and Daniel P. Huttenlocher},

title = {Pictorial Structures for Object Recognition},

journal = {IJCV},

year = {2003},

volume = {61},

pages = {2005}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

### Citations

8799 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1992
(Show Context)
Citation Context ...h each edge (v i , v j ). By definition, the MST of this graph is the tree with minimum total weight, which is exactly the set of edges E # defined by equation (7). The MST problem is well known (see =-=[11]-=-) and can be solved e#ciently. Kruskal's algorithm can be used to compute the MST in O(n 2 log n) time, since we have a complete graph with n nodes. 4 Matching Algorithms In this section we describe e... |

7319 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...is done using a generalization of the Viterbi and Forward-Backward algorithms (see [30]). Similar algorithms are known in the Bayesian Network community as belief propagation and belief revision (see =-=[29]-=-). Such polynomial time algorithms run in O(h 2 n) time, where n is the number of object parts, and h is a discrete number of possible locations for each part. Unfortunately this is too slow for gener... |

3896 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...(I|L, u), defined as p # (I|L, u) # p(I|L, u) 1/T # n # i=1 p(I|l i , u i ) 1/T , where T controls the degree of smoothing. This is a standard technique, borrowed from the principle of annealing (see =-=[18]-=-). Note that p # (I|L, u) is just the product of the smoothed likelihoods for each part. In all our experiments we used T = 10. 6.2 Geometry For the articulated objects, pairs of parts are connected b... |

2911 | Eigenfaces for recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...ent class of recognition methods, developed in the 1990's, which operate directly on images rather than first extracting discrete features or parts. These include both appearance-based methods (e.g., =-=[35]-=- and [28]) and template-based methods such as Hausdor# matching [21]. Such approaches treat images as the artifacts to be recognized, rather than having more abstract models based on features or other... |

2610 |
Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography
- Fischler, Bolles
- 1981
(Show Context)
Citation Context ...nces between features that have been extracted from an image, and features of a stored model. Examples 5 of this paradigm include interpretation tree search [19, 3], the alignment method [22], RANSAC =-=[14]-=- and geometric hashing [26]. The problem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This is particularly clear with approaches su... |

1558 |
Fundamentals of speech recognition
- Rabiner, Juang
- 1993
(Show Context)
Citation Context ...ops. In that case, it is possible to find the MAP estimate and sample from the distribution in polynomial time. This is done using a generalization of the Viterbi and Forward-Backward algorithms (see =-=[30]-=-). Similar algorithms are known in the Bayesian Network community as belief propagation and belief revision (see [29]). Such polynomial time algorithms run in O(h 2 n) time, where n is the number of o... |

1460 | Fast approximate energy minimization via graph cuts
- BOYKOV, VEKSLER, et al.
- 2001
(Show Context)
Citation Context ...sible solutions di#er substantially. This changes the computational nature of the problem. Solving this minimization for arbitrary graphs and arbitrary functions m i , d ij is an NP-hard problem (see =-=[7]-=-). However, when the graph G = (V, E) has a restricted form, the problem can be solved more e#ciently. For instance, with first-order snakes the graph is simply a chain, which enables a dynamic progra... |

1314 |
Statistical decision theory and Bayesian analysis. 2d ed
- Berger
- 1985
(Show Context)
Citation Context ...Note that both the p(l i , l j |c ij ) and p(L|E, c) are improper priors (they integrate to infinity). This is a consequence of using an uninformative prior over absolute locations for each part (see =-=[4]-=-). 10 Figure 3: Graphical representation of the dependencies between the location of object parts (black nodes) and the image in the restricted models (see text). So our models depend on parameters # ... |

984 |
Visual learning and recognition of 3-D objects from appearance
- Murase, Nayar
- 1995
(Show Context)
Citation Context ... of recognition methods, developed in the 1990's, which operate directly on images rather than first extracting discrete features or parts. These include both appearance-based methods (e.g., [35] and =-=[28]-=-) and template-based methods such as Hausdor# matching [21]. Such approaches treat images as the artifacts to be recognized, rather than having more abstract models based on features or other primitiv... |

881 | The design and use of steerable filters
- Freeman, Adelson
- 1991
(Show Context)
Citation Context ...onic representation can be computed more e#- ciently. In fact, the iconic representation can be computed very fast by convolving each level of a Gaussian pyramid with small x-y separable filters (see =-=[16]-=-). 5.2 Spatial Distribution The spatial configuration of the parts is modeled by a collection of springs connecting pairs of parts. Each connection (v i , v j ) is characterized by the ideal relative ... |

805 |
Sampling-based approaches to calculating marginal densities
- Gelfand, Smith
- 1990
(Show Context)
Citation Context ...rocedure lets us use somewhat inaccurate models for generating hypothesis and can be seen as a mechanism for visual selection (see [2]). It is also similar to the idea behind importance sampling (see =-=[17]-=-). 4 1.3 E#cient Algorithms Our goal is not only to construct a framework that is rich enough to capture the appearance of many generic objects, but also to be able to e#ciently solve the object detec... |

670 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ... E # , c # = arg max E,c m # k=1 p(L k |E, c). (5) We need to pick a set of edges that form a tree and the properties for each edge. This can be done in a similar way to the Chow and Liu algorithm in =-=[10]-=-, which estimates a tree distribution for discrete random variables. Equation (3) defines the prior probability of the object assuming configuration L k as, p(L k |E, c) = # (v i ,v j )#E p(l k i , l ... |

579 | Probabilistic visual learning for object recognition
- Moghaddam, Pentland
- 1997
(Show Context)
Citation Context ...s distribution as the sample mean and covariance. Note that we could use other methods to represent the appearance of image patches. In particular, we experimented with the eigenspace techniques from =-=[27]-=-. With a small number of training examples the eigenspace methods are no better than the iconic representation, and the iconic representation can be computed more e#- ciently. In fact, the iconic repr... |

504 | Comparing images using the Hausdorff distance
- Huttenlocher, Klanderman, et al.
- 1993
(Show Context)
Citation Context ...which operate directly on images rather than first extracting discrete features. These include both appearance-based methods (e.g., [40] and [30]) and templatebased methods such as Hausdorff matching =-=[22]-=-. Such approaches treat images as the entities to be recognized, rather than having more abstract models based on features or other primitives. One or more training images of an object are used to for... |

458 |
der Malsburg, “Face Recognition by Elastic Bunch Graph Matching
- Wiskott, Fellous, et al.
- 1997
(Show Context)
Citation Context ...esent models that represent objects by the appearance of local image patches and spatial relationships between those patches. This type of model has been popular in the context of face detection (see =-=[15, 9, 37]-=-). We first describe how we model the appearance of a part, and later describe how we model spatial relationships between parts. Learning an iconic model 20 Figure 4: Gaussian derivative basis functio... |

382 | Tracking people with twists and exponential maps
- Bregler, Malik
- 1998
(Show Context)
Citation Context ...es have recently been used for tracking people by matching models at each frame [34]. In contrast, most work on tracking highly articulated objects such as people relies heavily on motion information =-=[8, 26]-=- and only performs incremental updates in the object configuration. In such approaches, some other method is used to find an initial match of the model to the image, and then tracking commences from t... |

330 |
Distance transformations in digital images
- Borgefors
- 1986
(Show Context)
Citation Context ...O(k|B|) time, where k is the number of locations in the grid. On the other hand, e#cient algorithms exist to compute the distance transform in O(k) time, independent of the number of points in B (see =-=[5, 25]-=-). These algorithms have small constants and are very fast in practice. In order to compute the distance transform, it is commonly expressed as DB (x) = min y#G (d(x, y) + 1 B (y)) , where 1 B (y) is ... |

299 |
The representation and matching of pictorial structures
- Fischler, Elschlager
- 1973
(Show Context)
Citation Context ...ing objects using generic part-based models and that of learning such models from example images. Our work is motivated by the pictorial structure representation introduced by Fischler and Elschlager =-=[15]-=- thirty 1 (a) (b) Figure 1: Detection results for a face (a); and a human body (b). Each image shows the globally best location for the corresponding object, as computed by our algorithms. The object ... |

296 |
Using dynamic programming for solving variational problems in vision
- Amini, Weymouth, et al.
- 1990
(Show Context)
Citation Context ...orm, the problem can be solved more e#ciently. For instance, with first-order snakes the graph is simply a chain, which enables a dynamic programming solution that takes O(h 2 n) time as described in =-=[1]-=-. Moreover, with snakes the minimization is done over a small number of locations for each vertex (e.g., the current location plus the 8 neighbors on the image grid). This minimization is then iterate... |

259 |
Recognizing solid objects by alignment with an image
- Huttenlocher, Ullman
- 1990
(Show Context)
Citation Context ...r correspondences between features that have been extracted from an image, and features of a stored model. Examples 5 of this paradigm include interpretation tree search [19, 3], the alignment method =-=[22]-=-, RANSAC [14] and geometric hashing [26]. The problem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This is particularly clear with ... |

258 |
Hierarchical chamfer matching: A parametric edge matching algorithm
- Borgefors
- 1988
(Show Context)
Citation Context ... point in the set, DB (x) = min y#B d(x, y). In particular, DB is zero at any point in B, and is small at nearby locations. The distance transform is commonly used for matching edge based models (see =-=[6, 21]-=-). The trivial way to compute this function takes O(k|B|) time, where k is the number of locations in the grid. On the other hand, e#cient algorithms exist to compute the distance transform in O(k) ti... |

204 | Cardboard people: A parameterized model of articulated image motion
- Ju, Black, et al.
- 1996
(Show Context)
Citation Context ...es have recently been used for tracking people by matching models at each frame [34]. In contrast, most work on tracking highly articulated objects such as people relies heavily on motion information =-=[8, 26]-=- and only performs incremental updates in the object configuration. In such approaches, some other method is used to find an initial match of the model to the image, and then tracking commences from t... |

169 | Efficient matching of pictorial structures
- Felzenszwalb, Huttenlocher
- 2000
(Show Context)
Citation Context ...bility given an observed image. In some sense, this is our best guess for the object location. On the other hand, sampling from the posterior distribution is useful to produce multiple hypotheses. In =-=[13]-=- we presented a version of the MAP estimation algorithm that uses a di#erent restriction on the form of connections between parts. That form did not allow for e#cient sampling from the posterior distr... |

169 |
Localizing overlapping parts by searching the interpretation tree
- Grimson, Lozano-P6rez
- 1987
(Show Context)
Citation Context ...blem of e#ciently searching for correspondences between features that have been extracted from an image, and features of a stored model. Examples 5 of this paradigm include interpretation tree search =-=[19, 3]-=-, the alignment method [22], RANSAC [14] and geometric hashing [26]. The problem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This ... |

149 | A probabilistic approach to object recognition using local photometry and global geometry
- Burl, Weber, et al.
- 1998
(Show Context)
Citation Context ...at need a starting solution near the correct answer have been used. In this paper we introduce algorithms that can be used to match a large class of pictorial structure models to images e#ciently. In =-=[9]-=- a formulation similar to the one presented here was used to model pictorial structures consisting of a constellation of features. In their work instead of having connections between pairs of parts, a... |

131 | An active vision architecture based on iconic representations
- Rao, Ballard
- 1995
(Show Context)
Citation Context ...is specified by its (x, y) position in the image, so we have a two-dimensional pose space for each part. To model the appearance of each individual part we use the iconic representation introduced in =-=[31]-=-. The iconic representation is based on the response of Gaussian derivative filters of di#erent orders, orientations and scales. An image patch centered at some position is represented by a high-dimen... |

128 | HYPER: A new approach for the recognition and positioning of two-dimensional objects
- Ayache, Faugeras
- 1986
(Show Context)
Citation Context ...blem of e#ciently searching for correspondences between features that have been extracted from an image, and features of a stored model. Examples 5 of this paradigm include interpretation tree search =-=[19, 3]-=-, the alignment method [22], RANSAC [14] and geometric hashing [26]. The problem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This ... |

123 | Finding and tracking people from the bottom up
- Ramanan, Forsyth
- 2003
(Show Context)
Citation Context ...e parts can cross one another yielding vastly di#erent shapes. Finally we note that models similar to pictorial structures have recently been used for tracking people by matching models at each frame =-=[34]-=-. In contrast, most work on tracking highly articulated objects such as people relies heavily on motion information [8, 26] and only performs incremental updates in the object configuration. In such a... |

120 |
Ane invariant model-based object recognition
- Lamdan, Shwartz, et al.
- 1990
(Show Context)
Citation Context ...have been extracted from an image, and features of a stored model. Examples 5 of this paradigm include interpretation tree search [19, 3], the alignment method [22], RANSAC [14] and geometric hashing =-=[26]-=-. The problem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This is particularly clear with approaches such as part-based recognitio... |

118 | Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes
- Gdalyahu, Weinshall
- 1999
(Show Context)
Citation Context ... finding people in images we employ simple part models based on binary images obtained by background subtraction. This suggests comparisons with silhouette-based deformable matching techniques (e.g., =-=[18, 39]-=-). These approaches are quite di#erent, however. First of all, silhouette-based methods generally operate using boundary contours, requiring good segmentation of the object from the background. In con... |

108 | Probabilistic Methods for Finding People
- Ioffe, Forsyth
- 2001
(Show Context)
Citation Context ...istic algorithms that don't necessarily find the optimal match of the model to an image. Techniques to find people in images using models similar to the ones we present in Section 6 were described in =-=[23]-=-. However, their methods are more in line with the classical part-based recognition systems that first detect a number of hypothesis for 6 the individual parts, and later find groups of parts which ma... |

101 | Kimia: Recognition of shapes by editing shock graphs
- Sebastian, Klein, et al.
- 2001
(Show Context)
Citation Context ... finding people in images we employ simple part models based on binary images obtained by background subtraction. This suggests comparisons with silhouette-based deformable matching techniques (e.g., =-=[18, 39]-=-). These approaches are quite di#erent, however. First of all, silhouette-based methods generally operate using boundary contours, requiring good segmentation of the object from the background. In con... |

96 | A computational model for visual selection
- Amit, Geman
- 1999
(Show Context)
Citation Context ...t one or more of those as correct using an independent method. This procedure lets us use somewhat inaccurate models for generating hypothesis and can be seen as a mechanism for visual selection (see =-=[2]-=-). It is also similar to the idea behind importance sampling (see [17]). 4 1.3 E#cient Algorithms Our goal is not only to construct a framework that is rich enough to capture the appearance of many ge... |

96 | Segmentation by Grouping Junctions
- Ishikawa, Geiger
- 1998
(Show Context)
Citation Context ...c in this value. Another source of e#cient algorithms has been in restricting d ij to a particular form. This approach has been particularly fruitful in some recent work on MRFs for low-level vision (=-=[8, 7, 24]-=-). In our algorithm, we use constraints on both the structure of the graph and the form of d ij . By restricting the graphs to trees, a similar kind of dynamic programming can be applied as is done fo... |

95 |
Recognition by Parts. The
- Pentland
- 1987
(Show Context)
Citation Context ...n separately modeling the appearance of individual parts and the geometric relations between them. However most of these part-based methods make binary decisions about potential part locations (e.g., =-=[32], [13], [3-=-6], [9]). Moreover, most part-based methods use some kind of search heuristics, such as first matching a particular "distinctive" part and then searching for other parts given that initial m... |

77 | Recognition of planar object classes
- Burl, Perona
- 1996
(Show Context)
Citation Context ...ing the appearance of individual parts and the geometric relations between them. However most of these part-based methods make binary decisions about potential part locations (e.g., [32], [13], [36], =-=[9]). Moreove-=-r, most part-based methods use some kind of search heuristics, such as first matching a particular "distinctive" part and then searching for other parts given that initial match, in order to... |

59 | Recognition by functional parts
- Rivlin, Dickinson, et al.
- 1995
(Show Context)
Citation Context ...xtracting features or parts of objects from images can itself be seen as another form of recognition problem. This is particularly clear with approaches such as part-based recognition (e.g., [12] and =-=[32]-=-), where the primitives are sub-parts of objects -- for example the shade on a lamp. Not only is the feature or part extraction itself a recognition task, in some ways it is actually more di#cult than... |

58 |
Efficient Visual Recognition Using the Hausdorff distance”, vol 1173 of Lecture notes in computer science
- Rucklidge
- 1996
(Show Context)
Citation Context ...lized distance transform under different distances, by replacing the indicator function 1B(x) with an arbitrary function f(x). In particular we use the method of Karzanov (originally in [27], but see =-=[38]-=- for a better description) to compute the transform of a function under a Mahalanobis distance with diagonal covariance matrix. This algorithm can also compute B ′ j(li), the best location for vj as a... |

53 |
Comparing images using the hausdor distance
- Huttenlocker, Klanderman, et al.
- 1993
(Show Context)
Citation Context ...rate directly on images rather than first extracting discrete features or parts. These include both appearance-based methods (e.g., [35] and [28]) and template-based methods such as Hausdor# matching =-=[21]-=-. Such approaches treat images as the artifacts to be recognized, rather than having more abstract models based on features or other primitives. A single example or multiple training images are genera... |

35 |
Efficient synthesis of gaussian filters by cascaded uniform filters
- Wells
- 1986
(Show Context)
Citation Context ...ssian filter G is separable since the covariance matrix # ij is diagonal. We can compute a good approximation for the convolution in time linear in the set of grid locations using the techniques from =-=[36]-=-. 5 Iconic Models The framework presented so far is general in the sense that it doesn't fully specify how objects are represented. A particular modeling scheme must define the pose space for the obje... |

31 |
Machine perception of 3-D solids
- Roberts
- 1965
(Show Context)
Citation Context ...ves, or "features" are extracted from an image. In the second stage, stored models are matched against the features that were extracted from the image. For instance, in the pioneering work o=-=f Roberts [33]-=- children's blocks were recognized by first extracting edges and corners from images and then matching these features to polyhedral models of the blocks. The model-based recognition paradigm of the 19... |

11 |
The Circular Normal Distribution: Theory and Tables
- GUMBEL, GREENWOOD, et al.
- 1953
(Show Context)
Citation Context ...# i # # = # # x i y i # # + s i R # i # # x ij y ij # # , and # # x # j y # j # # = # # x j y j # # + s j R # j # # x ji y ji # # . The distribution over angles, M, is the von Mises distribution (see =-=[20]-=-), M(#, , k) # e k cos(#-) . The first two terms in the joint distribution measure the horizontal and vertical distances between the observed joint positions in the image. The third term measures the ... |

10 |
The Use of Geons for Generic 3-D Object Recognition
- Dickinson, Biederman, et al.
- 1993
(Show Context)
Citation Context ...blem of extracting features or parts of objects from images can itself be seen as another form of recognition problem. This is particularly clear with approaches such as part-based recognition (e.g., =-=[12]-=- and [32]), where the primitives are sub-parts of objects -- for example the shade on a lamp. Not only is the feature or part extraction itself a recognition task, in some ways it is actually more di#... |

8 |
EÆcient Visual Recognition Using the Hausdor Distance
- Rucklidge
- 1996
(Show Context)
Citation Context ...ized distance transform under di#erent distances, by replacing the indicator function 1 B (x) with an arbitrary function f(x). In particular we use the method of Karzanov (originally in [25], but see =-=[34]-=- for a better description) to compute the transform of a function under a Mahalanobis distance with diagonal covariance matrix. This algorithm can also compute B # j (l i ), the best location of v j a... |

7 |
Markov random fields with e#cient approximations
- Boykov, Veksler, et al.
- 1998
(Show Context)
Citation Context ...c in this value. Another source of e#cient algorithms has been in restricting d ij to a particular form. This approach has been particularly fruitful in some recent work on MRFs for low-level vision (=-=[8, 7, 24]-=-). In our algorithm, we use constraints on both the structure of the graph and the form of d ij . By restricting the graphs to trees, a similar kind of dynamic programming can be applied as is done fo... |

5 | Quick algorithm for determining the distances from the points of the given subset of an integer lattice to the points of its complement - Karzanov - 1992 |

3 |
algorithm for determining the distances from the points of the given subset of an integer lattice to the points of its complement
- Quick
- 1992
(Show Context)
Citation Context ...O(k|B|) time, where k is the number of locations in the grid. On the other hand, e#cient algorithms exist to compute the distance transform in O(k) time, independent of the number of points in B (see =-=[5, 25]-=-). These algorithms have small constants and are very fast in practice. In order to compute the distance transform, it is commonly expressed as DB (x) = min y#G (d(x, y) + 1 B (y)) , where 1 B (y) is ... |

2 |
Energy minimization with discontinuities. Under Review
- Boykov, Veksler, et al.
- 1998
(Show Context)
Citation Context ...ssible configurations di#er substantially. This changes the computational nature of the problem. Solving equation (8) for arbitrary graphs and arbitrary functions m i , d ij is an NPhard problem (see =-=[7]-=-). However, when the graph G = (V, E) has a restricted form, the problem can be solved more e#ciently. For instance, with first-order snakes the graph is simply a chain, which enables a dynamic progra... |

1 | Hierarchical chamfer matching: A parametric 1345 edge matching algorithm - Borgefors - 1988 |