Results 1 - 10
of
24
Learning a Fine Vocabulary
"... Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming emb ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. Experimentally we show that the novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 105k dataset/protocol. At the same time, retrieval with the proposed similarity function is faster than the reference method. 1
Exploiting descriptor distances for precise image search,” Research report
, 2011
"... apport de recherche ..."
Searching with quantization: approximate nearest neighbor search using short codes and distance estimators
, 2009
"... ..."
Rapid image retrieval for mobile location recognition
- in Proc. IEEE Conf. Acoustics, Speech and Signal Processing
, 2011
"... Recognizing the location and orientation of a mobile device from captured images is a promising application of image retrieval algorithms. Matching the query images to an existing georeferenced database like Google Street View enables mobile search for location related media, products, and services. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Recognizing the location and orientation of a mobile device from captured images is a promising application of image retrieval algorithms. Matching the query images to an existing georeferenced database like Google Street View enables mobile search for location related media, products, and services. Due to the rapidly changing field of view of the mobile device caused by constantly changing user attention, very low retrieval times are essential. These can be significantly reduced by performing the feature quantization on the handheld and transferring compressed Bag-of-Feature vectors to the server. To cope with the limited processing capabilities of handhelds, the quantization of high dimensional feature descriptors has to be performed at very low complexity. To this end, we introduce in this paper the novel Multiple Hypothesis Vocabulary Tree (MHVT) as a step towards real-time mobile location recognition. The MHVT increases the probability of assigning matching feature descriptors to the same visual word by introducing an overlapping buffer around the separating hyperplanes to allow for a soft quantization and an adaptive clustering approach. Further, a novel framework is introduced that allows us to integrate the probability of correct quantization in the distance calculation using an inverted file scheme. Our experiments demonstrate that our approach achieves query times reduced by up to a factor of 10 when compared to the state-of-the-art.
BRIEF: Computing a local binary descriptor very fast
"... Binary descriptors are becoming increasingly popular as a means to compare feature points very fast and while requiring comparatively small amounts of memory. The typical approach to creating them is to first compute floating-point ones, using an algorithm such as SIFT, and then to binarize them. In ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Binary descriptors are becoming increasingly popular as a means to compare feature points very fast and while requiring comparatively small amounts of memory. The typical approach to creating them is to first compute floating-point ones, using an algorithm such as SIFT, and then to binarize them. In this paper, we show that we can directly compute a binary descriptor we call BRIEF on the basis of simple intensity difference tests. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and SIFT on standard benchmarks and show that it yields comparable recognition accuracy, while running in an almost vanishing fraction of the time required by either. Index Terms Image processing and computer vision, feature matching, augmented reality, real-time matching1
Discriminative codeword selection for image representation
- in: Proceedings of the 18th ACM International Conference on Multimedia, 2010
"... Bag of features (BoF) representation has attracted an increasing amount of attention in large scale image processing systems. BoF representation treats images as loose collections of local invariant descriptors extracted from them. The visual codebook is generally constructed by using an unsupervise ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Bag of features (BoF) representation has attracted an increasing amount of attention in large scale image processing systems. BoF representation treats images as loose collections of local invariant descriptors extracted from them. The visual codebook is generally constructed by using an unsupervised algorithm such as K-means to quantize the local descriptors into clusters. Images are then represented by the frequency histograms of the codewords contained in them. To build a compact and discriminative codebook, codeword selection has become an indispensable tool. However, most of the existing codeword selection algorithms are supervised and the human labeling may be very expensive. In this paper, we consider the problem of unsupervisedcodeword selection, and propose a novel algorithm called Discriminative Codeword Selection (DCS). Motivated from recent studies on discriminative clustering, the central idea of our proposed algorithm is to select those codewords so that the cluster structure of the image database can be best respected. Specifically, a multi-output linear function is fitted to model the relationship between the data matrix after codeword selection and the indicator matrix. The most discriminative codewords are thus defined as those leading to minimal fitting error. Experiments on image retrieval and clustering have demonstrated the effectiveness of the proposed method.
Affinity Learning on a Tensor Product Graph with Applications to Shape and Image Retrieval
"... As observed in several recent publications, improved retrieval performance is achieved when pairwise similarities between the query and the database objects are replaced with more global affinities that also consider the relation among the database objects. This is commonly achieved by propagating t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As observed in several recent publications, improved retrieval performance is achieved when pairwise similarities between the query and the database objects are replaced with more global affinities that also consider the relation among the database objects. This is commonly achieved by propagating the similarity information in a weighted graph representing the database and query objects. Instead of propagating the similarity information on the original graph, we propose to utilize the tensor product graph (TPG) obtained by the tensor product of the original graph with itself. By virtue of this construction, not only local but also long range similarities among graph nodes are explicitly represented as higher order relations, making it possible to better reveal the intrinsic structure of the data manifold. In addition, we improve the local neighborhood structure of the original graph in a preprocessing stage. We illustrate the benefits of the proposed approach on shape and image ranking and retrieval tasks. We are able to achieve the bull’s eye retrieval score of 99.99 % on MPEG-7 shape dataset, which is much higher than the state-of-the-art algorithms. 1.
Low-Cost Asset Tracking using Location-Aware Camera Phones
"... Maintaining an accurate and up-to-date inventory of one’s assets is a labor-intensive, tedious, and costly operation. To ease this difficult but important task, we design and implement a mobile asset tracking system for automatically generating an inventory by snapping photos of the assets with a sm ..."
Abstract
- Add to MetaCart
Maintaining an accurate and up-to-date inventory of one’s assets is a labor-intensive, tedious, and costly operation. To ease this difficult but important task, we design and implement a mobile asset tracking system for automatically generating an inventory by snapping photos of the assets with a smartphone. Since smartphones are becoming ubiquitous, construction and deployment of our inventory management solution is simple and costeffective. Automatic asset recognition is achieved by first segmenting individual assets out of the query photo and then performing bag-of-visual-features (BoVF) image matching on the segmented regions. The smartphone’s sensor readings, such as digital compass and accelerometer measurements, can be used to determine the location of each asset, and this location information is stored in the inventory for each recognized asset. As a special case study, we demonstrate a mobile book tracking system, where users snap photos of books stacked on bookshelves to generate a location-aware book inventory. It is shown that segmenting the book spines is very important for accurate feature-based image matching into a database of book spines. Segmentation also provides the exact orientation of each book spine, so more discriminative upright local features can be employed for improved recognition. This system’s mobile client has been implemented for smartphones running the Symbian or Android operating systems. The client enables a user to snap a picture of a bookshelf and to
Feature Tracking for Wide-Baseline Image Retrieval
"... Abstract. We address the problem of large scale image retrieval in a wide-baseline setting, where for any query image all the matching database images will come from very different viewpoints. In such settingstraditionalbag-of-visual-wordsapproachesarenotequippedtohandle the significant feature desc ..."
Abstract
- Add to MetaCart
Abstract. We address the problem of large scale image retrieval in a wide-baseline setting, where for any query image all the matching database images will come from very different viewpoints. In such settingstraditionalbag-of-visual-wordsapproachesarenotequippedtohandle the significant feature descriptor transformations that occur under large camera motions. In this paper we present a novel approach that includes an offline step of feature matching which allows us to observe how local descriptors transform under large camera motions. These observations are encoded in a graph in the quantized feature space. This graph can be used directly within a soft-assignment feature quantization scheme for image retrieval.
Mohammad Abu-Alqumsan, Anas Al-Nuaimi, and Eckehard Steinbach] [ Low-latency and robust visual localization] © INGRAM PUBLISHING
"... Information about the location, orientation, and context of a mobile device is of central importance for future multimedia applications and location-based services (LBSs). With the widespread adoption of modern camera phones, including powerful processors, inertial measurement units, compass, and as ..."
Abstract
- Add to MetaCart
Information about the location, orientation, and context of a mobile device is of central importance for future multimedia applications and location-based services (LBSs). With the widespread adoption of modern camera phones, including powerful processors, inertial measurement units, compass, and assisted global positioning system (GPS) receivers, the variety of locationand context-based services has significantly increased over the last years. These include, for instance, the search for points of interest in the vicinity, geotagging and retrieval of user generated media, targeted advertising, navigation systems, social applications such as Foursquare [1], and many more. Digital Object Identifier 10.1109/MSP.2011.940882 Date of publication: 15 June 2011 While satellite navigation systems can provide sufficient positioning accuracy, a clear view of at least four satellites is required, limiting its applicability to outdoor scenarios with few obstacles. Unfortunately, most interesting LBSs could be provided in densely populated environments, which include urban canyons and indoor scenarios. Figure 1 shows the GPS recordings (black line) of an iPhone 4 while driving a car through downtown San Francisco. Although a state-of-the-artassisted GPS Broadcom chip is used, the phone mounting ensures the best signal reception, and a motion model is applied to filter out large deviations; the localization error is in the range of 50–100 m. This is caused by multipath effects, which are even more severe if the user is traveling on the sidewalks and not in the middle of the street. Here, an initial positioning

