Results 1 - 10
of
27
Scalable Recognition with a Vocabulary Tree
- IN CVPR
, 2006
"... A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme ..."
Abstract
-
Cited by 374 (0 self)
- Add to MetaCart
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme
Object retrieval with large vocabularies and fast spatial matching
- In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our ..."
Abstract
-
Cited by 139 (14 self)
- Add to MetaCart
In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive
Small codes and large image databases for recognition
- In Proceedings of the IEEE Conf on Computer Vision and Pattern Recognition
, 2008
"... The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In this paper, our goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices. Our approach uses recently developed machine learning techniques to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using our scheme, it is possible to perform real-time searches with millions from the Internet using a single large PC and obtain recognition results comparable to the full descriptor. Using our codes on high quality labeled images from the LabelMe database gives surprisingly powerful recognition results using simple nearest neighbor techniques. Recent interest in object recognition has yielded a wide range of approaches to describing the contents of an image. One important application for this technology is the visual search of large collections of images, such as those on the Internet or on people’s home computers. Accordingly, a number of recognition papers have explored this area. Nister and Stewenius demonstrate the real-time specific object recognition using a database of 40,000 images [19]; Obdrzalek and Matas show sub-linear indexing time on the COIL dataset [20]. A common theme is the representation of the image as a collection of feature vectors and the use of efficient data structures to handle the large num-
Kernelized locality-sensitive hashing for scalable image search
- IEEE International Conference on Computer Vision (ICCV
, 2009
"... Fast retrieval methods are critical for large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Fast retrieval methods are critical for large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm’s sub-linear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several large-scale datasets, and show that it enables accurate and fast performance for example-based object classification, feature matching, and content-based retrieval. 1.
Efficient Representation of Local Geometry for Large Scale Object Retrieval
, 2009
"... State of the art methods for image and object retrieval exploit both appearance (via visual words) and local geometry (spatial extent, relative pose). In large scale problems, memory becomes a limiting factor – local geometry is stored for each feature detected in each image and requires storage lar ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
State of the art methods for image and object retrieval exploit both appearance (via visual words) and local geometry (spatial extent, relative pose). In large scale problems, memory becomes a limiting factor – local geometry is stored for each feature detected in each image and requires storage larger than the inverted file and term frequency and inverted document frequency weights together. We propose a novel method for learning discretized local geometry representation based on minimization of average reprojection error in the space of ellipses. The representation requires only 24 bits per feature without drop in performance. Additionally, we show that if the gravity vector assumption is used consistently from the feature description to spatial verification, it improves retrieval performance and decreases the memory footprint. The proposed method outperforms state of the art retrieval algorithms in a standard image retrieval benchmark.
Efficient Sequential Correspondence Selection by Cosegmentation
, 2009
"... In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision process leads to a correspondence verification procedure that (i) has high precision (is highly discriminative) (ii) has good recall and (iii) is fast. The sequential decision on the correctness of a correspondence is based on simple statistics of a modified dense stereo matching algorithm. The statistics are projected on a prominent discriminative direction by SVM. Wald’s sequential probability ratio test is performed on the SVM projection computed on progressively larger cosegmented regions. We show experimentally that the proposed Sequential Correspondence Verification (SCV) algorithm significantly outperforms the standard correspondence selection method based on SIFT distance ratios on challenging matching problems.
Informed visual search: Combining attention and object recognition
- In Proceedings of ICRA
, 2008
"... Abstract — This paper studies the sequential object recognition problem faced by a mobile robot searching for specific objects within a cluttered environment. In contrast to current state-of-the-art object recognition solutions which are evaluated on databases of static images, the system described ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract — This paper studies the sequential object recognition problem faced by a mobile robot searching for specific objects within a cluttered environment. In contrast to current state-of-the-art object recognition solutions which are evaluated on databases of static images, the system described in this paper employs an active strategy based on identifying potential objects using an attention mechanism and planning to obtain images of these objects from numerous viewpoints. We demonstrate the use of a bag-of-features technique for ranking potential objects, and show that this measure outperforms geometric matching for invariance across viewpoints. Our system implements informed visual search by prioritising map locations and re-examining promising locations first. Experimental results demonstrate that our system is a highly competent object recognition system that is capable of locating numerous challenging objects amongst distractors. I.
A linear time histogram metric for improved sift matching
- In ECCV
"... Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to quantization, distortion, occlusion etc. In this paper we present a new Earth Mover’s Distance (EMD) variant. We show that it is a metric (unlike the original EMD [1] which is a metric only for normalized histograms). Moreover, it is a natural extension of the L1 metric. Second, we propose a linear time algorithm for the computation of the EMD variant, with a robust ground distance for oriented gradients. Finally, extensive experimental results on the Mikolajczyk and Schmid dataset [2] show that our method outperforms state of the art distances. 1
Efficient Visual Search of Videos Cast as Text Retrieval
"... Abstract—We describe an approach to object retrieval that searches for and localizes all of the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite ch ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract—We describe an approach to object retrieval that searches for and localizes all of the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination, and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word, which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google [6]. We report results for object retrieval on the full-length feature films “Groundhog Day, ” “Casablanca, ” and “Run Lola Run, ” including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures. Performance is also compared to a baseline method implementing standard frame to frame matching. Index Terms—Object recognition, viewpoint and scale invariance, text retrieval. Ç 1
Attentional Landmarks and Active Gaze Control for Visual SLAM
"... This paper is centered around landmark detection, tracking and matching for visual SLAM (Simultaneous Localization And Mapping) using a monocular vision system with active gaze control. We present a system specialized in creating and maintaining a sparse set of landmarks based on a biologically mot ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper is centered around landmark detection, tracking and matching for visual SLAM (Simultaneous Localization And Mapping) using a monocular vision system with active gaze control. We present a system specialized in creating and maintaining a sparse set of landmarks based on a biologically motivated feature selection strategy. A visual attention system detects salient features which are highly discriminative, ideal candidates for visual landmarks which are easy to redetect. Features are tracked over several frames to determine stable landmarks and to estimate their 3D position in the environment. Matching of current landmarks to database entries enables loop closing. Active gaze control allows us to overcome some of the limitations of using a monocular vision system with a relatively small field of view. It supports (i) the tracking of landmarks which enable a better pose estimation, (ii) the exploration of regions without landmarks to obtain a better distribution of landmarks in the environment, and (iii) the active redetection of landmarks to enable loop closing in situations in which a fixed camera fails to close the loop. Several real-world experiments show that accurate pose estimation is obtained with the presented system and that active camera control outperforms the passive approach.

