Results 1  10
of
23
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 139 (6 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Properties of embedding methods for similarity searching in metric spaces
 PAMI
, 2003
"... Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance functi ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance function. Often, the cost of evaluating the distance between two objects is very high. Thus, the number of distance evaluations should be kept at a minimum, while (ideally) maintaining the quality of the result. One way to approach this goal is to embed the data objects in a vector space so that the distances of the embedded objects approximates the actual distances. Thus, queries can be performed (for the most part) on the embedded objects. In this paper, we are especially interested in examining the issue of whether or not the embedding methods will ensure that no relevant objects are left out (i.e., there are no false dismissals and, hence, the correct result is reported). Particular attention is paid to the SparseMap, FastMap, and MetricMap embedding methods. SparseMap is a variant of Lipschitz embeddings, while FastMap and MetricMap are inspired by dimension reduction methods for Euclidean spaces (using KLT or the related PCA and SVD). We show that, in general, none of these embedding methods guarantee that queries on the embedded objects have no false dismissals, while also demonstrating the limited cases in which the guarantee does hold. Moreover, we describe a variant of SparseMap that allows queries with no false dismissals. In addition, we show that with FastMap and MetricMap, the distances of the embedded objects can be much greater than the actual distances. This makes it impossible (or at least impractical) to modify FastMap and MetricMap to guarantee no false dismissals.
A compact space decomposition for effective metric indexing
 Pattern Recognition Letters
, 2005
"... Abstract The metric space model abstracts many proximity search problems, from nearestneighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dime ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Abstract The metric space model abstracts many proximity search problems, from nearestneighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space,to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure isunbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. 1 Introduction The problem of proximity searching has received much attention in recent times, due to an increasing interest in manipulating and retrieving the more and more common multimedia data. Multimedia data have to be classified, forecasted, filtered, organized, and so on. Their manipulation poses new challenges to classifiers and function approximators. The wellknown knearest neighbor (knn) classifier is a favorite candidate for this task for being simple enough and well understood. One of the main obstacles, however, of using this classifier for massive data classification is its linear complexity to find a set of k neighbors for a given query.
Probabilistic Proximity Searching Algorithms Based on Compact Partitions
 Journal of Discrete Algorithms
, 2002
"... The main bottleneck of the research in metric space searching is the socalled curse of dimensionality, which makes the task of searching some metric spaces intrinsically dicult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
The main bottleneck of the research in metric space searching is the socalled curse of dimensionality, which makes the task of searching some metric spaces intrinsically dicult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can nd 99% of the relevant objects at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements.
Practical Construction of kNearest Neighbor Graphs in Metric Spaces
"... Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The knearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric s ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The knearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric space context, so d is a metric. Several knng construction algorithms are known, but they are not suitable to general metric spaces. We present a general methodology to construct knngs that exploits several features of metric spaces, requiring empirically around O(n 1.27) distance computations for low and medium dimensional spaces, and O(n 1.90) for high dimensional ones. Keywords: Graph Algorithms, Metric Spaces, Nearest Neighbors. 1
Using the knearest neighbor graph for proximity searching in metric spaces
 In Proc. SPIRE’05, LNCS 3772
, 2005
"... Abstract. Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an off ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as possible to satisfy queries, since the distance is considered expensive to compute. Proximity searching is central to several applications, ranging from multimedia indexing and querying to data compression and clustering. In this paper we present a new approach to solve the proximity searching problem. Our solution is based on indexing the database with the knearest neighbor graph (knng), which is a directed graph connecting each element to its k closest neighbors. We present two search algorithms for both range and nearest neighbor queries which use navigational and metrical features of the knng graph. We show that our approach is competitive against current ones. For instance, in the document metric space our nearest neighbor search algorithms perform 30 % more distance evaluations than AESA using only a 0.25 % of its space requirement. In the same space, the pivotbased technique is completely useless. 1
Euler Vector for Search and Retrieval Of GrayTone Images
 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 35, NO. 4, AUGUST 2005
, 2005
"... A new combinatorial characterization of a graytone image called Euler Vector is proposed. The Euler number of a binary image is a wellknown topological feature, which remains invariant under translation, rotation, scaling, and rubbersheet transformation of the image. The Euler vector comprises a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A new combinatorial characterization of a graytone image called Euler Vector is proposed. The Euler number of a binary image is a wellknown topological feature, which remains invariant under translation, rotation, scaling, and rubbersheet transformation of the image. The Euler vector comprises a 4tuple, where each element is an integer representing the Euler number of the partial binary image formed by the graycode representation of the four most significant bit planes of the graytone image. Computation of Euler vector requires only integer and Boolean operations. The Euler vector is experimentally observed to be robust against noise and compression. For efficient image indexing, storage and retrieval from an image database using this vector, a bucket searching technique based on a simple modification of Kdtree, is employed successfully. The Euler vector can also be used to perform an efficient fourdimensional range query. The set of retrieved images are finally ranked on the basis of Mahalanobis distance measure. Experiments are performed on the COIL database and results are reported. The retrieval success can be improved significantly by augmentiong the Euler vector by a few additional simple shape features. Since Euler vector can be computed very fast, the proposed technique is likely to find many applications to contentbased image retrieval.
Dynamic Spatial Approximation Trees for Massive Data
"... Abstract—Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects t ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic, that is, few of them tolerate insertion of elements at reasonable cost over an existing index and only a few work efficiently in secondary memory. In this paper we introduce a secondarymemory variant of the Dynamic Spatial Approximation Tree, which has shown to be competitive in main memory. The resulting index handles well the secondary memory scenario and is competitive with the state of the art, becoming a useful alternative in a wide range of database applications. Moreover, our ideas are applicable to other secondarymemory trees where there is little control over the tree shape. I.
Practical construction of k nearest neighbor graphs in metric spaces
, 2005
"... Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u)d be the k elements in U − {u} which have the smallest distance to u. The knearest neighbors graph (knng) is a directed graph G(U, E) such that E = {(u, v, d(u, v)), v ∈ NNk(u)d}. We focus on the metric spa ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u)d be the k elements in U − {u} which have the smallest distance to u. The knearest neighbors graph (knng) is a directed graph G(U, E) such that E = {(u, v, d(u, v)), v ∈ NNk(u)d}. We focus on the metric space context, so d is a metric. Several knngs construction algorithms are known, but they are not suitable to general metric spaces. We present two practical algorithms to construct knngs that exploit several features of metric spaces, obtaining time costs of the form O(n 1.63..2.24 k 0.02..0.59), and using O(n 0.91..1.96 k 0.04..0.66) distance computations. 1