Results 1 -
8 of
8
Searching in Metric Spaces by Spatial Approximation
, 1999
"... We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The ..."
Abstract
-
Cited by 62 (20 self)
- Add to MetaCart
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree ("spatial approximation tree"), is based on approaching spatially the searched objects, that is, getting closer and closer to them, rather than the classical divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that do not need to tune parameters, which makes it appealing for use by non-experts.
A compact space decomposition for effective metric indexing
- Pattern Recognition Letters
, 2005
"... Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dime ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space,to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure isunbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. 1 Introduction The problem of proximity searching has received much attention in recent times, due to an increasing interest in manipulating and retrieving the more and more common multimedia data. Multimedia data have to be classified, forecasted, filtered, organized, and so on. Their manipulation poses new challenges to classifiers and function approximators. The well-known k-nearest neighbor (knn) classifier is a favorite candidate for this task for being simple enough and well understood. One of the main obstacles, however, of using this classifier for massive data classification is its linear complexity to find a set of k neighbors for a given query.
An Effective Clustering Algorithm to Index High Dimensional Metric Spaces
"... A metric space consists of a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is to preprocess the set so that, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
A metric space consists of a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is to preprocess the set so that, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed to achieve this goal is the complexity measure. The problem is very difficult in the so-called high-dimensional metric spaces, where the histogram of distances has a large mean and a small variance. A recent survey on methods to index metric spaces has shown that the so-called clustering algorithms are better suited than their competitors, pivotbased algorithms, to cope with high-dimensional metric spaces. In this paper we present a new clustering method that achieves much better performance than all the existing data structures. We present analytical and experimental results that support our claims and that give the users the tuning parameters to make optimal use of this data structure.
Towards Measuring the Searching Complexity of Metric Spaces
- In Proc.ofthe Mexican Computing Meeting
, 2001
"... . In this paper we introduce a new measure of the intrinsic searching complexity of a general metric space. This measure reects the expected behavior of the search algorithms on the metric space, yet it is easy to estimate and independent of the search algorithm. We prove average case lower boun ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
. In this paper we introduce a new measure of the intrinsic searching complexity of a general metric space. This measure reects the expected behavior of the search algorithms on the metric space, yet it is easy to estimate and independent of the search algorithm. We prove average case lower bounds, in terms of this complexity measure, for a large class of proximity search algorithms. This gives some new insight on the intrinsic diculty of the search problem in metric spaces. 1
A uni model for similarity searching
- Actas del Encuentro Nacional de Computaci on Mexicano
, 1999
"... Abstract. The indexing algorithms and data structures for similarity seaching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions. Currently, the only realistic way to compare two different alg ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The indexing algorithms and data structures for similarity seaching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions. Currently, the only realistic way to compare two different algorithms is to apply them to the same data set. We present a unified model for studying similarity searching algorithms, defining common complexity measures allowing comparison between different approaches.
Analysis of Search Algorithms and Tree Structures for Proximity Search in Metric Spaces by
"... Proximity search in metric spaces involves searching the elements of a set that are close to a specified query point when the data elements form a metric space. The triangle inequality is a fundamental property of metric spaces and can be utilized in various ways to prune the metric search space. Th ..."
Abstract
- Add to MetaCart
Proximity search in metric spaces involves searching the elements of a set that are close to a specified query point when the data elements form a metric space. The triangle inequality is a fundamental property of metric spaces and can be utilized in various ways to prune the metric search space. There are various frameworks under which metric spaces have been organized and the algorithms used to perform proximity queries are dependent on how the metric space tree has been structured. We present a classification of the search strategies based on triangle inequalities and the metric tree indexing algorithms. Algorithms are presented for various combinations of these strategies which result in different trade-offs of the time and space required for the search. Experimental analysis of these algorithms is performed in the context of the biological database management system called MoBIoS (Molecular Biological Information System) that we are developing. 1.
A Unified Model for Similarity Searching
"... The indexing algorithms and data structures for similarity searching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions. Currently, the only realistic way to compare two different algori ..."
Abstract
- Add to MetaCart
The indexing algorithms and data structures for similarity searching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions. Currently, the only realistic way to compare two different algorithms is to apply them to the same data set. We present a unified model for studying similarity searching algorithms, defining common complexity measures allowing comparison between different approaches.
Similarity Search in Metric Spaces
, 2004
"... This report is a reprint of a thesis that embodies the results of research done in partialful llmentoftherequirementsforthedegreeofMasterofMathematicsinComputer Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by ..."
Abstract
- Add to MetaCart
This report is a reprint of a thesis that embodies the results of research done in partialful llmentoftherequirementsforthedegreeofMasterofMathematicsinComputer Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrary metric spaces rather than restricting ourselves to vector spaces. The contributions in this paper include the following: First, after de ning pivot sharing and pivot localization, we prove probabilistically that pivot sharing level should be increased for scattered data while pivot localization level should be increased for clustered data. This conclusion is supported by extensive experiments. Moreover, we proposed two new algorithms, RLAESA and NGH-tree. RLAESA, using high pivot sharing level and low pivot localization level, outperforms thefastestalgorithminthesamecategory, MVP-tree. NGH-treeisusedasaframework to show the e ect of increasing pivot sharing level on search e ciency. It provides a way to improve the search e ciency in almost all algorithms. The experiments with RLAESA and NGH-tree not only show their preformance, but also support the rst conclusion we mentioned above. Second, we analyzed the issue of disk I/O on similarity search and proposed a new algorithm SLAESA to improve the search e ciency by switching random I/O access to sequential I/O access.

