Results 1 - 10
of
22
Product quantization for nearest neighbor search
, 2010
"... This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace q ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace quantization indices. The Euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code. Experimental results show that our approach searches for nearest neighbors efficiently, in particular in combination with an inverted file system. Results for SIFT and GIST image descriptors show excellent search accuracy outperforming three state-of-the-art approaches. The scalability of our approach is validated on a dataset of two billion vectors.
LDAHash: Improved matching with smaller descriptors
, 2010
"... SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometri ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptorsareonlyapproximatelyinvariantinpractice. Secondly, descriptors are usually high-dimensional (e.g. SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space, in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
Mobile Visual Search
- IEEE SIGNAL PROCESSING MAGAZINE, SPECIAL ISSUE ON MOBILE MEDIA SEARCH
"... MOBILE phones have evolved into powerful image and video processing devices, equipped with highresolution cameras, color displays, and hardware-accelerated graphics. They are increasingly also equipped with GPS, and connected to broadband wireless networks. All this enables a new class of applicatio ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
MOBILE phones have evolved into powerful image and video processing devices, equipped with highresolution cameras, color displays, and hardware-accelerated graphics. They are increasingly also equipped with GPS, and connected to broadband wireless networks. All this enables a new class of applications which use the camera phone to initiate search queries about objects in visual proximity to the user (Fig 1). Such applications can be used, e.g., for identifying products, comparison shopping, finding information about movies, CDs, real estate, print media or artworks. First deployments of such systems include Google Goggles [1], Nokia Point and Find [2], Kooaba [3], Ricoh iCandy [4], [5], [6] and Amazon Snaptell [7]. Mobile image retrieval applications pose a unique set of challenges. What part of the processing should be performed
Inverted index compression for scalable image matching
- in [Proc. Data Compression Conference (DCC’10
, 2010
"... To perform fast image matching against large databases, a Vocabulary Tree (VT) uses an inverted index that maps from each tree node to database images which have visited that node. The inverted index can require gigabytes of memory, which significantly slows down the database server. In this paper, ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
To perform fast image matching against large databases, a Vocabulary Tree (VT) uses an inverted index that maps from each tree node to database images which have visited that node. The inverted index can require gigabytes of memory, which significantly slows down the database server. In this paper, we design, develop, and compare techniques for inverted index compression for image-based retrieval. We show that these techniques significantly reduce memory usage, by as much as 5×, without loss in recognition accuracy. Our work includes fast decoding methods, an offline database reordering scheme that exploits the similarity between images for additional memory savings, and a generalized coding scheme for soft-binned feature descriptor histograms. We also show that reduced index memory permits memory-intensive image matching techniques that boost recognition accuracy. 1.
Empowering Visual Categorization With the GPU
, 2011
"... Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to 1) keep categorization accuracy intact, 2) decompose the problem, and 3) give the same numerical results. In the experiments on large scale datasets, it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%.
BAG OF WORDS FOR LARGE SCALE OBJECT RECOGNITION Properties and Benchmark
"... image search, image retrieval, bag of words, inverted file, min hash, benchmark, object recognition. Object Recognition in a large scale collection of images has become an important application of widespread use. In this setting, the goal is to find the matching image in the collection given a probe ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
image search, image retrieval, bag of words, inverted file, min hash, benchmark, object recognition. Object Recognition in a large scale collection of images has become an important application of widespread use. In this setting, the goal is to find the matching image in the collection given a probe image containing the same object. In this work we explore the different possible parameters of the bag of words (BoW) approach in terms of their recognition performance and computational cost. We make the following contributions: 1) we provide a comprehensive benchmark of the two leading methods for BoW: inverted file and min-hash; and 2) we explore the effect of the different parameters on their recognition performance and run time, using four diverse real world datasets. 1
Searching with quantization: approximate nearest neighbor search using short codes and distance estimators
, 2009
"... ..."
Scalable Face Image Retrieval with Identity-Based Quantization and Multi-Reference Re-ranking
"... State-of-the-art image retrieval systems achieve scalability by using bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they 1) produce visual words with low discriminative power for face images, and 2) ignore t ..."
Abstract
- Add to MetaCart
State-of-the-art image retrieval systems achieve scalability by using bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they 1) produce visual words with low discriminative power for face images, and 2) ignore the special properties of the faces. The leading features for face recognition can achieve good retrieval performance, but these features are not suitable for inverted indexing as they are high-dimensional and global, thus not scalable in either computational or storage cost. In this paper we aim to build a scalable face image retrieval system. For this purpose, we develop a new scalable face representation using both local and global features. In the indexing stage, we exploit special properties of faces to design new component-based local features, which are subsequently quantized into visual words using a novel identity-based quantization scheme. We also use a very small hamming signature (40 bytes) to encode the discriminative global feature for each face. In the retrieval stage, candidate images are firstly retrieved from the inverted index of visual words. We then use a new multireference distance to re-rank the candidate images using the hamming signature. On a one-millon face database, we show that our local features and global hamming signatures are complementary—the inverted index based on local features provides candidate images with good recall, while the multi-reference re-ranking with global hamming signature leads to good precision. As a result, our system is not only scalable but also outperforms the linear scan retrieval system using the state-of-the-art face recognition feature in term of the quality. 1.
DYNAMICSELECTIONOF A FEATURE-RICHQUERYFRAME FORMOBILEVIDEORETRIEVAL
"... Inthispaper,wefocusonanewapplicationofmobilevisualsearch: snapping a photo with a mobile device of a video playing on a TV screen to automatically retrieve and stream the remainder of the video to the mobile device. When the user takes a photo of the video,thecapturedqueryframemaycontaintoofewuseful ..."
Abstract
- Add to MetaCart
Inthispaper,wefocusonanewapplicationofmobilevisualsearch: snapping a photo with a mobile device of a video playing on a TV screen to automatically retrieve and stream the remainder of the video to the mobile device. When the user takes a photo of the video,thecapturedqueryframemaycontaintoofewusefulfeatures for good retrieval performance. We design and implement a new algorithm for mobile video retrieval to accurately select a featurerich frame from a sequence of viewfinder frames in a very short temporal window determined by the user-initiated query event. Fast and accurate selection using efficiently computed Hessian scores is developed for real-time operation on mobile devices. Viewfinder frames captured before the query starts are pre-processed, while the number of viewfinder frames captured afterwards is minimized by a probabilistic optimization process. Evaluated on a large video database of 10 million frames, dynamic query frame selection provides a substantial increase in retrieval accuracy with very low search latency.

