## 3D Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints (2006)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www-cvr.ai.uiuc.edu]
- [www.di.ens.fr]
- [courses.ece.uiuc.edu]
- [robotics.ai.uiuc.edu]
- [www.cs.illinois.edu]
- [hal.inria.fr]
- [robotics.ai.uiuc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | International Journal of Computer Vision |

Citations: | 75 - 11 self |

### BibTeX

@ARTICLE{Rothganger063dobject,

author = {Fred Rothganger and Svetlana Lazebnik and Cordelia Schmid and Jean Ponce},

title = {3D Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints},

journal = {International Journal of Computer Vision},

year = {2006},

volume = {66},

pages = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.

### Citations

5109 | Distinctive image features from scale-invariant keypoints - Lowe |

3158 |
A.: “Multiple View Geometry in Computer Vision
- Hartley, Zisserman
- 2004
(Show Context)
Citation Context ...s. As noted in Section 2.2, a match between m ≥2 affine regions is equivalent to a match between m triples of points, thus the machinery developed in the structure from motion (Faugeras et al., 2001; =-=Hartley and Zisserman, 2000-=-; Tomasi and Kanade, 1992) and pose estimation (Huttenlocher and Ullman, 1987; Lowe, 1987) literature can in principle be used to extend our approach to the perspective case. This is particularly rele... |

2791 | Eigenfaces for recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...olor pattern, and thuss232 Rothganger et al. typically lack an effective mechanism for selecting promising matches. Appearance-based methods—as originally proposed in the context of face recognition (=-=Turk and Pentland, 1991-=-; Pentland et al., 1994; Belhumeur et al., 1997) and 3D object recognition (Murase and Nayar, 1995; Selinger and Nelson, 1999)—take the opposite view, and prefer to explicit geometric reasoning a clas... |

2452 |
Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography
- Fischler, Bolles
- 1981
(Show Context)
Citation Context ...987; Huttenlocher and Ullman, 1987; Lowe, 1987), and geometric hashing (Lamdan and Wolfson, 1988; Lamdan and Wolfson, 1991). An alternative is offered by robust estimation algorithms, such as RANSAC (=-=Fischler and Bolles, 1981-=-), and its variants (Torr and Zisserman, 2000), and median least-squares, that consider candidate correspondences consistent with asmallsetofseed matches as inliers to be retained in a fitting process... |

1503 | Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
- Belhumeur, Hespanha, et al.
- 1997
(Show Context)
Citation Context ...ically lack an effective mechanism for selecting promising matches. Appearance-based methods—as originally proposed in the context of face recognition (Turk and Pentland, 1991; Pentland et al., 1994; =-=Belhumeur et al., 1997-=-) and 3D object recognition (Murase and Nayar, 1995; Selinger and Nelson, 1999)—take the opposite view, and prefer to explicit geometric reasoning a classical pattern recognition framework (Duda et al... |

1154 | Performance evaluation of local descriptors
- Mikolajczyk, Schmid
- 2005
(Show Context)
Citation Context ...r coordinate. Other feature spaces may of course be used as well. In particular, the SIFT descriptor introduced by Lowe (2004) has been shown to provide superior performance in image retrieval tasks (=-=Mikolajczyk and Schmid, 2003-=-). Briefly, the SIFT description of an image region is a three-dimensional histogram over the spatial image dimensions and the gradient orientations, with the original rectangular area broken into 16 ... |

977 | An affine invariant interest point detector - Mikolajczyk, Schmid - 2002 |

958 |
Visual Learning and Recognition of 3-D Objects from Appearance
- Murase, Nayar
- 1995
(Show Context)
Citation Context ...omising matches. Appearance-based methods—as originally proposed in the context of face recognition (Turk and Pentland, 1991; Pentland et al., 1994; Belhumeur et al., 1997) and 3D object recognition (=-=Murase and Nayar, 1995-=-; Selinger and Nelson, 1999)—take the opposite view, and prefer to explicit geometric reasoning a classical pattern recognition framework (Duda et al., 2001) that exploits the discriminatory power of ... |

870 | Object class recognition by unsupervised scale-invariant learning
- Fergus, Perona, et al.
- 2003
(Show Context)
Citation Context ...wide-baseline stereo matching (Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004), image retrieval (Schmid and Mohr, 1997; Pope and Lowe, 2000), and object recognition tasks (Weber et al., 2000; =-=Fergus et al., 2003-=-; Mahamud and Hebert, 2003; Lowe, 2004). These methods normally either require storing a large number of views for each object (Schmid and Mohr, 1997; Pope and Lowe, 2000; Mahamud and Hebert, 2003; Lo... |

717 |
Computer Vision: A Modern Approach
- Forsyth, Ponce
- 2000
(Show Context)
Citation Context ..., M being chosen in this case as E(M) + 2S(M), where E(M) = w −N is the expected value of the number of draws required to get one good sample and S(M) = √ 1 − w N /w N is its standard deviation. See (=-=Forsyth and Ponce, 2002-=-, p. 347) for details. by 10K. We retain correspondences whose score is at least two standard deviations above average. In a typical case (matching the first two bear images), the mean score is 1.2, w... |

638 | View-based and modular eigenspaces for face recognition
- Pentland, Moghaddam, et al.
- 1994
(Show Context)
Citation Context ...2 Rothganger et al. typically lack an effective mechanism for selecting promising matches. Appearance-based methods—as originally proposed in the context of face recognition (Turk and Pentland, 1991; =-=Pentland et al., 1994-=-; Belhumeur et al., 1997) and 3D object recognition (Murase and Nayar, 1995; Selinger and Nelson, 1999)—take the opposite view, and prefer to explicit geometric reasoning a classical pattern recogniti... |

628 |
Robust wide baseline stereo from maximally stable extremal regions
- Matas, Chum, et al.
- 2002
(Show Context)
Citation Context ...es. This setup (with some variation in the number of input images) is typical of our modeling experiments. matching pairs of overlapping images—a process akin to wide-baseline stereo (Baumberg, 2000; =-=Matas et al., 2002-=-; Mikolajczyk and Schmid, 2002; Pritchett and Zisserman, 1998; Schaffalitzky and Zisserman, 2002; Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004) and (robust) structure from motion (Tomasi and... |

486 | Feature detection with automatic scale selection
- Lindeberg
- 1998
(Show Context)
Citation Context ...geometrically inconsistent candidate matches. Concretely, we propose using local image descriptors that are invariant under affine transformations of the spatial domain (G˚arding and Lindeberg, 1996; =-=Lindeberg, 1998-=-; Baumberg, 2000; Schaffalitzky and Zisserman, 2002; Mikolajczyk and Schmid, 2002) and of the brightness/color signal (Lowe, 2004) to capture the appearance of salient surface patches, and a set of mu... |

460 | R.: Local Grayvalue Invariants for Image Retrieval
- Schmid, Mohr
- 1997
(Show Context)
Citation Context ...interest points” (Harris and Stephens, 1988)—with local and/or global geometric constraints in wide-baseline stereo matching (Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004), image retrieval (=-=Schmid and Mohr, 1997-=-; Pope and Lowe, 2000), and object recognition tasks (Weber et al., 2000; Fergus et al., 2003; Mahamud and Hebert, 2003; Lowe, 2004). These methods normally either require storing a large number of vi... |

385 | Bundle Adjustment - A modern synthesis
- Triggs, McLauchlan, et al.
(Show Context)
Citation Context ...ed as an eigenvalue problem. In our experiments, however, we have found this linear approach to be numerically ill behaved (this is related to the inherent affine gauge ambiguity of our problem, see (=-=Triggs et al., 1999-=-) for a discussion of this issue). Thus, in practice, we pick an arbitrary block as root, and iteratively register all others with this one using linear least squares, before using a non-linear method... |

382 |
A statistical method for 3D object detection applied to faces and cars
- Schneiderman, Kanade
- 2000
(Show Context)
Citation Context ...ally either require storing a large number of views for each object (Schmid and Mohr, 1997; Pope and Lowe, 2000; Mahamud and Hebert, 2003; Lowe, 2004), or limiting the range of admissible viewpoints (=-=Schneiderman and Kanade, 2000-=-; Weber et al., 2000; Fergus et al., 2003). In contrast, our approach supports the automatic acquisition of explicit 3D affine and Euclidean object models from multiple unregistered images, and their ... |

326 | Geometric hashing: A general and efficient modelbased recognition scheme - Wolfson, Lamdan - 1998 |

319 | Indexing based on scale invariant interest points
- Mikolajczyk, Schmid
- 2001
(Show Context)
Citation Context ...); and (3) the Harris (1988) operator is used to refine the position of the ellipse’s center (localization, see Mikolajczyk and Schmid, 2002). The scale-invariant interest point detector proposed in (=-=Mikolajczyk and Schmid, 2001-=-) provides an initial guess for this procedure, and the elliptical region obtained at convergence can be shown to be covariant under affine transformations (see G˚arding and Lindeberg, 1996; Lindeberg... |

316 | The Geometry of Multiple Images
- Faugeras, Luong, et al.
- 2001
(Show Context)
Citation Context ...rse projection matrix N j . A rectified patch can be thought of as a fictitious view of the original surface patch (Fig. 6), and the mapping Sij can thus be decomposed into an inverse projection N j (=-=Faugeras et al., 2001-=-) that maps the rectified patch onto the corresponding surface patch, followed by a projection Mi that maps that patch onto its projection in image number i. In particular, we can write Sij = MiNj for... |

305 | Limits on super-resolution and how to break them - Baker, Kanade - 2002 |

292 | Affine structure from motion - Koenderink, Doorn - 1991 |

283 | Unsupervised learning of models for recognition
- Weber, Welling, et al.
- 2000
(Show Context)
Citation Context ...tric constraints in wide-baseline stereo matching (Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004), image retrieval (Schmid and Mohr, 1997; Pope and Lowe, 2000), and object recognition tasks (=-=Weber et al., 2000-=-; Fergus et al., 2003; Mahamud and Hebert, 2003; Lowe, 2004). These methods normally either require storing a large number of views for each object (Schmid and Mohr, 1997; Pope and Lowe, 2000; Mahamud... |

247 | Reliable feature matching across widely separated views - Baumberg - 2000 |

247 | A paraperspective factorization method for shape and motion recovery
- Poelman, Kanade
- 1997
(Show Context)
Citation Context ...nd Zisserman, 1998; Schaffalitzky and Zisserman, 2002; Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004) and (robust) structure from motion (Tomasi and Kanade, 1992; Weinshall and Tomasi, 1995; =-=Poelman and Kanade, 1997-=-)—before stitching the corresponding partial models into a complete one. While it is possible to select these pairs automatically (Schaffalitzky and Zisserman, 2002), we have chosen to specify them ma... |

240 | MLESAC: a new robust estimator with application to estimating image geometry - S, Zisserman - 2000 |

229 |
The representation, recognition, and locating of 3-d objects
- Faugeras, Hebert
- 1986
(Show Context)
Citation Context ...tographs. Traditional feature-based geometric approaches to this problem—such as alignment (Ayache and Faugeras, ∗A preliminary version of this article has appeared in Rothganger et al. (2003). 1986; =-=Faugeras and Hebert, 1986-=-; Grimson and Lozano-Pérez, 1987; Huttenlocher and Ullman, 1987; Lowe, 1987) or geometric hashing (Thompson and Mundy, 1987; Lamdan and Wolfson, 1988, 1991)— enumerate various subsets of geometric ima... |

229 |
Geometric Invariance in Computer Vision
- Mundy, Zisserman
- 1992
(Show Context)
Citation Context ...et al., 1993)—admit invariants, general 3D shapes do not (Burns et al., 1993), which is the main reason why invariants have fallen out of favor after an intense flurry of activity in the early 1990s (=-=Mundy and Zisserman, 1992-=-; Mundy et al., 1994). We propose in this article to revisit invariants as a local description of truly threedimensional objects: Indeed, although smooth surfaces are almost never planar in the large,... |

185 | Kanade: Shape and motion from image streams under orthography: a factorization method
- Tomasi, T
- 1992
(Show Context)
Citation Context ...ghtness/color signal (Lowe, 2004) to capture the appearance of salient surface patches, and a set of multi-view geometric constraints related to those studied in the structure from motion literature (=-=Tomasi and Kanade, 1992-=-) to capture their spatial relationship. Our approach is directly related to a number of recent techniques that combine local models of image appearance in the neighborhood of salient features—or “int... |

174 | A.: Multi-view Matching for Unordered Image Sets, or “How Do I Organize My Holiday Snaps - Schaffalitzky, Zisserman |

170 |
Object recognition using alignment
- Huttenlocher, Ullman
- 1987
(Show Context)
Citation Context ... this problem—such as alignment (Ayache and Faugeras, ∗A preliminary version of this article has appeared in Rothganger et al. (2003). 1986; Faugeras and Hebert, 1986; Grimson and Lozano-Pérez, 1987; =-=Huttenlocher and Ullman, 1987-=-; Lowe, 1987) or geometric hashing (Thompson and Mundy, 1987; Lamdan and Wolfson, 1988, 1991)— enumerate various subsets of geometric image features before using pose consistency constraints to confir... |

163 |
Localizing overlapping parts by searching the interpretation tree
- Grimson, Lozano-Pérez
- 1987
(Show Context)
Citation Context ...re-based geometric approaches to this problem—such as alignment (Ayache and Faugeras, ∗A preliminary version of this article has appeared in Rothganger et al. (2003). 1986; Faugeras and Hebert, 1986; =-=Grimson and Lozano-Pérez, 1987-=-; Huttenlocher and Ullman, 1987; Lowe, 1987) or geometric hashing (Thompson and Mundy, 1987; Lamdan and Wolfson, 1988, 1991)— enumerate various subsets of geometric image features before using pose co... |

147 | Gool. Matching widely separated views based on affine invariant regions - Tuytelaars, Van |

123 | Hyper: A new approach for the recognition and positioning of two-dimensional objects
- Ayache, Faugeras
- 1986
(Show Context)
Citation Context ...ty, clutter, and occlusion. Various approaches to finding a reasonable set of geometrically-consistent matches have been proposed in the past, including interpretation tree (or alignment) techniques (=-=Ayache and Faugeras, 1986-=-; Faugeras and Hebert, 1986; Grimson and Lozano-Pérez, 1987; Huttenlocher and Ullman, 1987; Lowe, 1987), and geometric hashing (Lamdan and Wolfson, 1988; Lamdan and Wolfson, 1991). An alternative is o... |

116 | Gool. Simultaneous object recognition and segmentation by image exploration - Ferrari, Tuytelaars, et al. |

98 |
Three-dimensional model matching from an unconstrained viewpoint
- Thompson, Mundy
- 1987
(Show Context)
Citation Context ...ary version of this article has appeared in Rothganger et al. (2003). 1986; Faugeras and Hebert, 1986; Grimson and Lozano-Pérez, 1987; Huttenlocher and Ullman, 1987; Lowe, 1987) or geometric hashing (=-=Thompson and Mundy, 1987-=-; Lamdan and Wolfson, 1988, 1991)— enumerate various subsets of geometric image features before using pose consistency constraints to confirm or discard competing match hypotheses, but they largely ig... |

97 | Super-resolved surface reconstruction from multiple images, NASA
- Cheeseman, Kanefsky, et al.
- 1994
(Show Context)
Citation Context ...s patch can be constructed in a number of ways. One possibility is to combine the texture information from each measured image patch into a single high-quality copy using super-resolution techniques (=-=Cheeseman et al., 1994-=-; Capel and Zisserman, Figure 15. The bear model, along with the recovered affine camera configurations. These cameras are shown at an arbitrary constant distance from the origin.s3D Object Modeling a... |

91 |
A representation for shape based on peaks and ridges in the difference of low pass transform
- Crowley, Parker
(Show Context)
Citation Context ...czyk and Schmid, 2002) used in our implementation. 2.1.1. Detection. Several approaches to finding perceptually-salient blob-like image primitives in natural images were proposed in the mid-eighties (=-=Crowley and Parker, 1984-=-; Voorhees and Poggio, 87). Blostein and Ahuja (1989) took a first step toward building some invariance in this process with a multi-scale region detector based on maxima of the Laplacian. Lindeberg (... |

84 |
Bundle adjustmenta modern synthesis. Vision algorithms: theory and practice
- Triggs, McLauchlan, et al.
- 2000
(Show Context)
Citation Context ...ed as an eigenvalue problem. In our experiments, however, we have found this linear approach to be numerically ill behaved (this is related to the inherent affine gauge ambiguity of our problem, see (=-=Triggs et al., 1999-=-) for a discussion of this issue. Thus, in practice, we pick an arbitrary block as root, and iteratively register all others with this one using linear least squares, before using a non-linear method ... |

76 | 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints - Rothganger, Lazebnik, et al. - 2003 |

67 | saliency and image description - Kadir, Brady, et al. |

65 |
The viewpoint consistency constraint
- Lowe
- 1987
(Show Context)
Citation Context ... (Ayache and Faugeras, ∗A preliminary version of this article has appeared in Rothganger et al. (2003). 1986; Faugeras and Hebert, 1986; Grimson and Lozano-Pérez, 1987; Huttenlocher and Ullman, 1987; =-=Lowe, 1987-=-) or geometric hashing (Thompson and Mundy, 1987; Lamdan and Wolfson, 1988, 1991)— enumerate various subsets of geometric image features before using pose consistency constraints to confirm or discard... |

55 | Super-resolution from multiple views using learnt image models - Capel, Zisserman - 2001 |

54 | Direct computation of shape cues using scale-adapted spatial derivative operators,”Int’l - Garding, Lindeberg |

54 |
Wide baseline point matching using affine invariants computed from intensity profiles
- Tell, Carlsson
- 2000
(Show Context)
Citation Context ...al models of image appearance in the neighborhood of salient features—or “interest points” (Harris and Stephens, 1988)—with local and/or global geometric constraints in wide-baseline stereo matching (=-=Tell and Carlsson, 2000-=-; Tuytelaars and Van Gool, 2004), image retrieval (Schmid and Mohr, 1997; Pope and Lowe, 2000), and object recognition tasks (Weber et al., 2000; Fergus et al., 2003; Mahamud and Hebert, 2003; Lowe, 2... |

53 | The combinatorics of object recognition in cluttered environments using constrained search
- Grimson
- 1988
(Show Context)
Citation Context ...n to be a low-order polynomial in the size n of the model when there is little or no clutter, and exponential in n in the presence of clutter when no limit on the depth of the tree search is imposed (=-=Grimson, 1990-=-). The worst-case computational complexity of our bounded tree search is O(n N ), but determining its expected cost is beyond the scope of this paper. As will be shown in Section 4.5, the “greedy” ver... |

51 |
Invariant properties of straight homogenous generalized cylinders and their contours
- Ponce, Chelberg, et al.
- 1989
(Show Context)
Citation Context ...indexing mechanism for object recognition tasks. Unfortunately, although planar objects and certain simple shapes—such as bilateral symmetries (Nalwa, 1988) or various types of generalized cylinders (=-=Ponce et al., 1989-=-; Liu et al., 1993)—admit invariants, general 3D shapes do not (Burns et al., 1993), which is the main reason why invariants have fallen out of favor after an intense flurry of activity in the early 1... |

47 | A Perceptual Grouping Hierarchy for AppearanceBased 3D Object Recognition. Computer Vision and Image Understanding 76(1), 83–92
- Selinger, Nelson
- 1999
(Show Context)
Citation Context ...nce-based methods—as originally proposed in the context of face recognition (Turk and Pentland, 1991; Pentland et al., 1994; Belhumeur et al., 1997) and 3D object recognition (Murase and Nayar, 1995; =-=Selinger and Nelson, 1999-=-)—take the opposite view, and prefer to explicit geometric reasoning a classical pattern recognition framework (Duda et al., 2001) that exploits the discriminatory power of (relatively) low-dimensiona... |

45 |
View variation of point-set and line-segment features
- Burns, Weiss, et al.
- 1993
(Show Context)
Citation Context ...jects and certain simple shapes—such as bilateral symmetries (Nalwa, 1988) or various types of generalized cylinders (Ponce et al., 1989; Liu et al., 1993)—admit invariants, general 3D shapes do not (=-=Burns et al., 1993-=-), which is the main reason why invariants have fallen out of favor after an intense flurry of activity in the early 1990s (Mundy and Zisserman, 1992; Mundy et al., 1994). We propose in this article t... |

39 | Probabilistic models of appearance for 3D object recognition
- Pope, Lowe
- 2000
(Show Context)
Citation Context ...s and Stephens, 1988)—with local and/or global geometric constraints in wide-baseline stereo matching (Tell and Carlsson, 2000; Tuytelaars and Van Gool, 2004), image retrieval (Schmid and Mohr, 1997; =-=Pope and Lowe, 2000-=-), and object recognition tasks (Weber et al., 2000; Fergus et al., 2003; Mahamud and Hebert, 2003; Lowe, 2004). These methods normally either require storing a large number of views for each object (... |

39 | Segmenting, modeling, and matching video clips containing multiple moving objects
- ROTHGANGER, LAZEBNIK, et al.
- 2006
(Show Context)
Citation Context ... equations in the neighborhood of each patch, and used it to extend the approach proposed in this article to the problems of motion segmentation, scene modeling, and scene recognition in video clips (=-=Rothganger et al., 2004-=-). Admittedly, our current implementation is slow, especially compared to the systems proposed by Lowe (2004), and Mahamud and Hebert (2003), that achieve frame-rate object detection in cluttered scen... |