Results 1 - 10
of
23
Descriptor Learning for Efficient Retrieval
"... Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either h ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either harming performance or requiring correction using additional storage or processing. This paper aims to reduce these quantization errors at source, by learning a projection from descriptor space to a new Euclidean space in which standard clustering techniques are more likely to assign matching descriptors to the same cluster, and non-matching descriptors to different clusters. To achieve this, we learn a non-linear transformation model by minimizing a novel margin-based cost function, which aims to separate matching descriptors from two classes of non-matching descriptors. Training data is generated automatically by leveraging geometric consistency. Scalable, stochastic gradient methods are used for the optimization. For the case of particular object retrieval, we demonstrate impressive gains in performance on a ground truth dataset: our learnt 32-D descriptor without spatial re-ranking outperforms a baseline method using 128-D SIFT descriptors with spatial re-ranking. 1
Efficient Sequential Correspondence Selection by Cosegmentation
, 2009
"... In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision process leads to a correspondence verification procedure that (i) has high precision (is highly discriminative) (ii) has good recall and (iii) is fast. The sequential decision on the correctness of a correspondence is based on simple statistics of a modified dense stereo matching algorithm. The statistics are projected on a prominent discriminative direction by SVM. Wald’s sequential probability ratio test is performed on the SVM projection computed on progressively larger cosegmented regions. We show experimentally that the proposed Sequential Correspondence Verification (SCV) algorithm significantly outperforms the standard correspondence selection method based on SIFT distance ratios on challenging matching problems.
Visual-Inertial Navigation, Mapping and Localization: A Scalable Real-Time Causal Approach
, 2010
"... We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the c ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the camera coordinate frame and the inertial unit. We show that it is possible to estimate both state and parameters as part of an on-line procedure, but only provided that the motion sequence is “rich enough,” a condition that we characterize explicitly. We then describe an efficient implementation of a filter to estimate the state and parameters of this model, including gravity and camera-to-inertial calibration. It runs in real-time on an embedded platform, and its performance has been tested extensively. We report experiments of continuous operation, without failures, re-initialization, or re-calibration, on paths of length up to 30Km. We also describe an integrated approach to “loop-closure,” that is the recognition of previously-seen locations and the topological re-adjustment of the traveled path. It represents visual features relative to the global orientation reference provided by the gravity vector estimated by the filter, and relative to the scale provided by their known position within the map; these features are organized into “locations ” defined by visibility constraints, represented in a topological graph, where loop closure can be performed without the need to re-compute past trajectories or perform bundle adjustment. The software infrastructure as well as the embedded platform is described in detail in a technical report (Jones and Soatto (2009).)
Avoiding confusing features in place recognition
"... We seek to recognize the place depicted in a query image using a database of “street side” images annotated with geolocation information. This is a challenging task due to changes in scale, viewpoint and lighting between the query and the images in the database. One of the key problems in place re ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We seek to recognize the place depicted in a query image using a database of “street side” images annotated with geolocation information. This is a challenging task due to changes in scale, viewpoint and lighting between the query and the images in the database. One of the key problems in place recognition is the presence of objects such as trees or road markings, which frequently occur in the database and hence cause significant confusion between different places. As the main contribution, we show how to avoid features leading to confusion of particular places by using geotags attached to database images as a form of supervision. We develop a method for automatic detection of image-specific and spatially-localized groups of confusing features, and demonstrate that suppressing them significantly improves place recognition performance while reducing the database size. We show the method combines well with the state of the art bag-of-features model including query expansion, and demonstrate place recognition that generalizes over wide range of viewpoints and lighting conditions. Results are shown on a geotagged database of over 17K images of Paris downloaded from Google Street View.
Mobile Visual Search
- IEEE SIGNAL PROCESSING MAGAZINE, SPECIAL ISSUE ON MOBILE MEDIA SEARCH
"... MOBILE phones have evolved into powerful image and video processing devices, equipped with highresolution cameras, color displays, and hardware-accelerated graphics. They are increasingly also equipped with GPS, and connected to broadband wireless networks. All this enables a new class of applicatio ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
MOBILE phones have evolved into powerful image and video processing devices, equipped with highresolution cameras, color displays, and hardware-accelerated graphics. They are increasingly also equipped with GPS, and connected to broadband wireless networks. All this enables a new class of applications which use the camera phone to initiate search queries about objects in visual proximity to the user (Fig 1). Such applications can be used, e.g., for identifying products, comparison shopping, finding information about movies, CDs, real estate, print media or artworks. First deployments of such systems include Google Goggles [1], Nokia Point and Find [2], Kooaba [3], Ricoh iCandy [4], [5], [6] and Amazon Snaptell [7]. Mobile image retrieval applications pose a unique set of challenges. What part of the processing should be performed
Image Webs: Computing and Exploiting Connectivity in Image Collections
"... The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called I ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.
Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images
"... We present a method to segment a collection of unlabeled images while exploiting automatically discovered appearance patterns shared between them. Given an unlabeled pool of multi-object images, we first detect any visual clusters present among their sub-regions, where inter-region similarity is mea ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a method to segment a collection of unlabeled images while exploiting automatically discovered appearance patterns shared between them. Given an unlabeled pool of multi-object images, we first detect any visual clusters present among their sub-regions, where inter-region similarity is measured according to both appearance and contextual layout. Then, using each initial segment as a seed, we solve a graph cuts problem to refine its boundary— enforcing preferences to include nearby regions that agree with an ensemble of representative regions discovered for that cluster, and exclude those regions that resemble familiar objects. Through extensive experiments, we show that the segmentations computed jointly on the collection agree more closely with true object boundaries, when compared to either a bottom-up baseline or a graph cuts foreground segmentation that can only access cues from a single image. 1.
Detecting and Sketching the Common
"... Given very few images containing a common object of interest under severe variations in appearance, we detect the common object and provide a compact visual representation of that object, depicted by a binary sketch. Our algorithm is composed of two stages: (i) Detect a mutually common (yet non-triv ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Given very few images containing a common object of interest under severe variations in appearance, we detect the common object and provide a compact visual representation of that object, depicted by a binary sketch. Our algorithm is composed of two stages: (i) Detect a mutually common (yet non-trivial) ensemble of ‘self-similarity descriptors ’ shared by all the input images. (ii) Having found such a mutually common ensemble, ‘invert ’ it to generate a compact sketch which best represents this ensemble. This provides a simple and compact visual representation of the common object, while eliminating the background clutter of the query images. It can be obtained from very few query images. Such clean sketches may be useful for detection, retrieval, recognition, co-segmentation, and for artistic graphical purposes. 1.
BAG OF WORDS FOR LARGE SCALE OBJECT RECOGNITION Properties and Benchmark
"... image search, image retrieval, bag of words, inverted file, min hash, benchmark, object recognition. Object Recognition in a large scale collection of images has become an important application of widespread use. In this setting, the goal is to find the matching image in the collection given a probe ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
image search, image retrieval, bag of words, inverted file, min hash, benchmark, object recognition. Object Recognition in a large scale collection of images has become an important application of widespread use. In this setting, the goal is to find the matching image in the collection given a probe image containing the same object. In this work we explore the different possible parameters of the bag of words (BoW) approach in terms of their recognition performance and computational cost. We make the following contributions: 1) we provide a comprehensive benchmark of the two leading methods for BoW: inverted file and min-hash; and 2) we explore the effect of the different parameters on their recognition performance and run time, using four diverse real world datasets. 1
Binary Coherent Edge Descriptors
"... Abstract. Patch descriptors are used for a variety of tasks ranging from finding corresponding points across images, to describing object category parts. In this paper, we propose an image patch descriptor based on edge position, orientation and local linear length. Unlike previous works using histo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Patch descriptors are used for a variety of tasks ranging from finding corresponding points across images, to describing object category parts. In this paper, we propose an image patch descriptor based on edge position, orientation and local linear length. Unlike previous works using histograms of gradients, our descriptor does not encode relative gradient magnitudes. Our approach locally normalizes the patch gradients to remove relative gradient information, followed by orientation dependent binning. Finally, the edge histogram is binarized to encode edge locations, orientations and lengths. Two additional extensions are proposed for fast PCA dimensionality reduction, and a min-hash approach for fast patch retrieval. Our algorithm produces state-of-the-art results on previously published object instance patch data sets, as well as a new patch data set modeling intra-category appearance variations. 1

