Results 1 - 10
of
50
Scalable Recognition with a Vocabulary Tree
- IN CVPR
, 2006
"... A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme ..."
Abstract
-
Cited by 374 (0 self)
- Add to MetaCart
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme
A comparison of affine region detectors
- International Journal of Computer Vision
, 2005
"... The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris [24, 34] and Hessian points [24 ..."
Abstract
-
Cited by 149 (7 self)
- Add to MetaCart
The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris [24, 34] and Hessian points [24], as proposed by Mikolajczyk and Schmid and by Schaffalitzky and Zisserman; a detector of ‘maximally stable extremal regions’, proposed by Matas et al. [21]; an edge-based region detector [45] and a detector based on intensity extrema [47], proposed by Tuytelaars and Van Gool; and a detector of ‘salient regions’, proposed by Kadir, Zisserman and Brady [12]. The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework. 1
Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects
, 2005
"... This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multi–view constraints associated with groups of affine–covariant scene patches and a normalized description of their appearance ar ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multi–view constraints associated with groups of affine–covariant scene patches and a normalized description of their appearance are used to segment a scene into its rigid components, construct three–dimensional models of these components, and match instances of models recovered from different image sequences. The proposed approach has been implemented, and it is applied to the detection and matching of moving objects in video sequences and to shot matching, i.e., the identification of shots that depict the same scene in a video clip.
Vision-based slam: Stereo and monocular approaches
- Int. J. Compt. Vision
, 2006
"... Building a spatially consistent model is a key functionality to endow a mobile robot with autonomy. Without an initial map or an absolute localization means, it requires to concurrently solve the localization and mapping problems. For this purpose, vision is a powerful sensor, because it provides da ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Building a spatially consistent model is a key functionality to endow a mobile robot with autonomy. Without an initial map or an absolute localization means, it requires to concurrently solve the localization and mapping problems. For this purpose, vision is a powerful sensor, because it provides data from which stable features can be extracted and matched as the robot moves. But it does not directly provide 3D information, which is a difficulty for estimating the geometry of the environment. This article presents two approaches to the SLAM problem using vision: one with stereovision, and one with monocular images. Both approaches rely on a robust interest point matching algorithm that works in very diverse environments. The stereovision based approach is a classic SLAM implementation, whereas the monocular approach introduces a new way to initialize landmarks. Both approaches are analyzed and compared with extensive experimental results, with a rover and a blimp. 1
Viewpoint-independent object class detection using 3d feature maps
- in In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2008
"... This paper presents a 3D approach to multi-view object class detection. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. We propose instead to build 3D representations of object classes which allow to handle viewpoint chang ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This paper presents a 3D approach to multi-view object class detection. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. We propose instead to build 3D representations of object classes which allow to handle viewpoint changes and intra-class variability. Our approach extracts a set of pose and class discriminant features from synthetic 3D object models using a filtering procedure, evaluates their suitability for matching to real image data and represents them by their appearance and 3D position. We term these representations 3D Feature Maps. For recognizing an object class in an image we match the synthetic descriptors to the real ones in a 3D voting scheme. Geometric coherence is reinforced by means of a robust pose estimation which yields a 3D bounding box in addition to the 2D localization. The precision of the 3D pose estimation is evaluated on a set of images of a calibrated scene. The 2D localization is evaluated on the PASCAL 2006 dataset for motorbikes and cars, showing that its performance can compete with state-of-the-art 2D object detectors. 1.
A surface-growing approach t multi-view Ster recOnstruction
- In CVPR, 2007. [4] . Okutomi
"... We present a new approach to reconstruct the shape of a 3D object or scene from a set of calibrated images. The central idea of our method is to combine the topological flexibility of a point-based geometry representation with the robust reconstruction properties of scene-aligned planar primitives. ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We present a new approach to reconstruct the shape of a 3D object or scene from a set of calibrated images. The central idea of our method is to combine the topological flexibility of a point-based geometry representation with the robust reconstruction properties of scene-aligned planar primitives. This can be achieved by approximating the shape with a set of surface elements (surfels) in the form of planar disks which are independently fitted such that their footprint in the input images matches. Instead of using an artificial energy functional to promote the smoothness of the recovered surface during fitting, we use the smoothness assumption only to initialize planar primitives and to check the feasibility of the fitting result. After an initial disk has been found, the recovered region is iteratively expanded by growing further disks in tangent direction. The expansion stops when a disk rotates by more than a given threshold during the fitting step. A global sampling strategy guarantees that eventually the whole surface is covered. Our technique does not depend on a shape prior or silhouette information for the initialization and it can automatically and simultaneously recover the geometry, topology, and visibility information which makes it superior to other state-of-theart techniques. We demonstrate with several high-quality reconstruction examples that our algorithm performs highly robustly and is tolerant to a wide range of image capture modalities. 1.
Compact object descriptors from local colour invariant histograms
- In British Machine Vision Conference
, 2006
"... Much emphasis has recently been placed on the detection and recognition of locally (weak) affine invariant region descriptors for object recognition. In this paper, we take recognition one step further by developing features for non-planar objects. We consider the description of objects with locally ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Much emphasis has recently been placed on the detection and recognition of locally (weak) affine invariant region descriptors for object recognition. In this paper, we take recognition one step further by developing features for non-planar objects. We consider the description of objects with locally smoothly varying surface. For this class of objects, colour invariant histogram matching has proven to be very encouraging. However, matching many local colour cubes is computationally demanding. We propose a compact colour descriptor, which we call Wiccest, requiring only 12 numbers to locally capture colour and texture information. The Wiccest features are shown to be fairly insensitive to photometric effects like shadow, shading, and illumination colour. Moreover, we demonstrate the features to be applicable to highly compressed images while retaining discriminative power. 1
Multi-modal Semantic Place Classification
, 2010
"... The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multi-modal place classification system that ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multi-modal place classification system that allows a mobile robot to identify places and recognize semantic categories in an indoor environment. The system effectively utilizes information from different robotic sensors by fusing multiple visual cues and laser range data. This is achieved using a high-level cue integration scheme based on a Support Vector Machine (SVM) that learns how to optimally combine and weight each cue. Our multi-modal place classification approach can be used to obtain a real-time semantic space labeling system which integrates information over time and space. We perform an extensive experimental evaluation of the method for two different platforms and environments, on a realistic off-line database and in a live experiment on an autonomous robot. The results clearly demonstrate the effec-
Modeling 3d objects from stereo views and recognizing them in photographs
- In ECCV
, 2006
"... Local appearance models in the neighborhood of salient image features, together with local and/or global geometric constraints, serve as the basis for several recent and effective approaches to 3D object recognition from photographs. However, these techniques typically either fail to explicitly acco ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Local appearance models in the neighborhood of salient image features, together with local and/or global geometric constraints, serve as the basis for several recent and effective approaches to 3D object recognition from photographs. However, these techniques typically either fail to explicitly account for the strong geometric constraints associated with multiple images of the same 3D object, or require a large set of training images with much overlap to construct relatively sparse object models. This paper proposes a simple new method for automatically constructing 3D object models consisting of dense assemblies of small surface patches and affine-invariant descriptions of the corresponding texture patterns from a few (7 to 12) stereo pairs. Similar constraints are used to effectively identify instances of these models in highly cluttered photographs taken from arbitrary and unknown viewpoints. Experiments with a dataset consisting of 80 test images of 9 objects, including comparisons with a number of baseline algorithms, demonstrate the promise of the proposed approach. 1

