Results 1  10
of
180
Image retrieval: ideas, influences, and trends of the new age
 ACM COMPUTING SURVEYS
, 2008
"... We have witnessed great interest and a wealth of promise in contentbased image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger ass ..."
Abstract

Cited by 270 (8 self)
 Add to MetaCart
We have witnessed great interest and a wealth of promise in contentbased image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 133 (6 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Automated Extraction and Parameterization of Motions in Large Data Sets
 ACM Transactions on Graphics
, 2004
"... Large motion data sets often contain many variants of the same kind of motion, but without appropriate tools it is difficult to fully exploit this fact. This paper provides automated methods for identifying logically similar motions in a data set and using them to build a continuous and intuitively ..."
Abstract

Cited by 111 (2 self)
 Add to MetaCart
Large motion data sets often contain many variants of the same kind of motion, but without appropriate tools it is difficult to fully exploit this fact. This paper provides automated methods for identifying logically similar motions in a data set and using them to build a continuous and intuitively parameterized space of motions. To find logically similar motions that are numerically dissimilar, our search method employs a novel distance metric to find “close ” motions and then uses them as intermediaries to find more distant motions. Search queries are answered at interactive speeds through a precomputation that compactly represents all possibly similar motion segments. Once a set of related motions has been extracted, we automatically register them and apply blending techniques to create a continuous space of motions. Given a function that defines relevant motion parameters, we present a method for extracting motions from this space that accurately possess new parameters requested by the user. Our algorithm extends previous work by explicitly constraining blend weights to reasonable values and having a runtime cost that is nearly independent of the number of example motions. We present experimental results on a test data set of 37,000 frames, or about ten minutes of motion sampled at 60 Hz.
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
KLEE: A Framework for Distributed TopK Query Algorithms
 In VLDB
, 2005
"... This paper addresses the efficient processing of topk queries in widearea distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption ..."
Abstract

Cited by 73 (12 self)
 Add to MetaCart
This paper addresses the efficient processing of topk queries in widearea distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed topk queries, designed for high performance and flexibility. KLEE makes a strong case for approximate topk algorithms over widely distributed data sources. It shows how great gains in efficiency can be enjoyed at low resultquality penalties. Further, KLEE affords the queryinitiating peer the flexibility to tradeoff result quality and expected performance and to tradeoff the number of communication phases engaged during query execution versus network bandwidth performance. We have implemented KLEE and related algorithms and conducted a comprehensive performance evaluation. Our evaluation employed realworld and synthetic large, webdata collections, and query benchmarks. Our experimental results show that KLEE can achieve major performance gains in terms of network bandwidth, query response times, and much lighter peer loads, all with small errors in result precision and other resultquality measures.
Product quantization for nearest neighbor search
, 2010
"... This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace q ..."
Abstract

Cited by 71 (10 self)
 Add to MetaCart
This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace quantization indices. The Euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code. Experimental results show that our approach searches for nearest neighbors efficiently, in particular in combination with an inverted file system. Results for SIFT and GIST image descriptors show excellent search accuracy outperforming three stateoftheart approaches. The scalability of our approach is validated on a dataset of two billion vectors.
Featurebased similarity search in 3D object databases
 ACM Computing Surveys
, 2005
"... The development of effective contentbased multimedia search systems is an important research issue due to the growing amount of digital audiovisual information. In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices. A similar dev ..."
Abstract

Cited by 66 (10 self)
 Add to MetaCart
The development of effective contentbased multimedia search systems is an important research issue due to the growing amount of digital audiovisual information. In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices. A similar development is expected for 3D data as
Indexing Hierarchical Structures Using Graph Spectra
, 2005
"... Hierarchical image structures are abundant in computer vision and have been used to encode part structure, scale spaces, and a variety of multiresolution features. In this paper, we describe a framework for indexing such representations that embeds the topological structure of a directed acyclic g ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
Hierarchical image structures are abundant in computer vision and have been used to encode part structure, scale spaces, and a variety of multiresolution features. In this paper, we describe a framework for indexing such representations that embeds the topological structure of a directed acyclic graph (DAG) into a lowdimensional vector space. Based on a novel spectral characterization of a DAG, this topological signature allows us to efficiently retrieve a promising set of candidates from a database of models using a simple nearestneighbor search. We establish the insensitivity of the signature to minor perturbation of graph structure due to noise, occlusion, or node split/merge. To accommodate largescale occlusion, the DAG rooted at each nonleaf node of the query "votes" for model objects that share that "part," effectively accumulating local evidence in a model DAG's topological subspaces. We demonstrate the approach with a series of indexing experiments in the domain of viewbased 3D object recognition using shock graphs.
iDistance: An Adaptive B+tree Based Indexing Method for Nearest Neighbor Search
"... In this paper, we present an efficient B+tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data po ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
In this paper, we present an efficient B+tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B +tree structure and KNN search to be performed using onedimensional range search. The choice of partition and reference point adapt the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.
The concentration of fractional distances
 IEEE Trans. on Knowledge and Data Engineering
, 2007
"... Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, t ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowskilike norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given. Index Terms—Nearest neighbor search, highdimensional data, distance concentration, fractional distances. 1