Results 1  10
of
27
Fully Dynamic Spatial Approximation Trees
 In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE 2002), LNCS 2476
, 2002
"... The Spatial Approximation Tree (satree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. Its main drawbacks are: costly construction ..."
Abstract

Cited by 22 (12 self)
 Add to MetaCart
The Spatial Approximation Tree (satree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. Its main drawbacks are: costly construction time, poor performance in low dimensional spaces or queries with high selectivity, and the fact of being a static data structure, that is, once built, one cannot add or delete elements.
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in m ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in many pattern recognition tasks. This, for example, renders the KNN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against stateoftheart exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
A Metric Index for Approximate String Matching
 In LATIN
, 2002
"... We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approxima ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the R occurrences of a pattern of length m in a text of length n in average time O(m log n+m +R), using O(n log n) space and O(n log n) index construction time. This complexity improves by far over all other previous methods. We also show a simpler scheme needing O(n) space.
Practical Construction of kNearest Neighbor Graphs in Metric Spaces
"... Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The knearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric s ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The knearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric space context, so d is a metric. Several knng construction algorithms are known, but they are not suitable to general metric spaces. We present a general methodology to construct knngs that exploits several features of metric spaces, requiring empirically around O(n 1.27) distance computations for low and medium dimensional spaces, and O(n 1.90) for high dimensional ones. Keywords: Graph Algorithms, Metric Spaces, Nearest Neighbors. 1
Engineering efficient metric indexes
 Pattern Recognition Letters (PRL
"... We give efficient algorithms and index data structures for range search in general metric spaces. We give a simple methods to make almost any existing algorithm memory adaptive, improving the search the more memory is available. For vector spaces and metric space of strings we show how several dista ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We give efficient algorithms and index data structures for range search in general metric spaces. We give a simple methods to make almost any existing algorithm memory adaptive, improving the search the more memory is available. For vector spaces and metric space of strings we show how several distances can be computed in bitparallelly, in sequential computer, and use the result to improve the search performance. This method works especially well with approximate range queries. The experimental results show that we can obtain order of magnitude improvements over the existing methods. Key words: algorithms, data structures, information retrieval, metric space indexing, proximity searching, bitparallel distance evaluations, memory adaptiveness 1
Simple SpaceTime TradeOffs for AESA
"... Abstract. We consider indexing and range searching in metric spaces. The best method known is AESA, in practice requiring the fewest number of distance evaluations to answer range queries. The problem with AESA is its space complexity, requiring storage for Θ(n 2) distance values to index n objects. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. We consider indexing and range searching in metric spaces. The best method known is AESA, in practice requiring the fewest number of distance evaluations to answer range queries. The problem with AESA is its space complexity, requiring storage for Θ(n 2) distance values to index n objects. We give several methods to reduce this cost. The main observation is that exact distance values are not needed, but lower and upper bounds suffice. The simplest of our methods need only Θ(n 2) bits (as opposed to words) of storage, but the price to pay is more distance evaluations, the exact cost depending on the dimension, as compared to AESA. To reduce this efficiency gap we extend our method to use b distance bounds, requiring Θ(n 2 log 2 (b)) bits of storage. The scheme uses also Θ(b) or Θ(bn) words of auxiliary space. We experimentally show that using b ∈ {1,..., 16} (depending on the problem instance) gives good results. Our preprocessing and side computation costs are the same as for AESA. We propose several improvements, achieving e.g. O(n 1+α) construction cost for some 0 < α < 1, and a variant using even less space. 1
Recursive Lists of Clusters: A Dynamic Data Structure for Range Queries in Metric Spaces
"... Abstract. We introduce a novel data structure for solving the range query problem in generic metric spaces. It can be seen as a dynamic version of the List of Clusters data structure of Chávez and Navarro. Experimental results show that, with respect to range queries, it outperforms the original dat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We introduce a novel data structure for solving the range query problem in generic metric spaces. It can be seen as a dynamic version of the List of Clusters data structure of Chávez and Navarro. Experimental results show that, with respect to range queries, it outperforms the original data structure when the database dimension is below 12. Moreover, the building process is much more efficient, for any size and any dimension of the database. 1
Speeding Up Permutation Based Indexing with Indexing
"... Abstract—A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permutants is chosen, and are sorted (permuted) by their distances against every database element. The permutations form the index. When a query is given, its corresponding permutation is computed, and — as similar elements will (probably) have a similar permutation — the database is compared in the order induced by the similarity between permutations. This works well but has relatively high CPU time due to computing the distances between permutations and (partially) sorting the database by the similarity. We improve this by identifying and solving this as another metric space problem. This avoids many distance computations between the permutants. The experimental results show that this works extremely well in practice. Keywordsmetric space indexing; probabilistic algorithms; indexing permutations; I.
Analyzing Metric Space Indexes: What For?
"... It has been a long way since the beginnings of metric space searching, where people coming from algorithmics tried to apply their background to this new paradigm, obtaining variable, but especially difficult to explain, success or lack of it. Since then, some has been learned about the specifics of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
It has been a long way since the beginnings of metric space searching, where people coming from algorithmics tried to apply their background to this new paradigm, obtaining variable, but especially difficult to explain, success or lack of it. Since then, some has been learned about the specifics of the problem, in particular regarding key aspects such as the intrinsic dimensionality, that were not well understood in the beginning. The inclusion of those aspects in the picture has led to the most important developments in the area. Similarly, researchers have tried to apply asymptotic analysis concepts to understand and predict the performance of the data structures. Again, it was soon clear that this was insufficient, and that the characteristics of the metric space itself could not be neglected. Although some progress has been made on understanding concepts such as the curse of dimensionality, modern researchers seem to have given up in using asymptotic analysis. They rely on experiments, or at best in detailed cost models that are good predictors but do not explain why the data structures perform in the way they do. In this paper I will argue that this is a big loss. Even if the predictive capability of asymptotic analysis is poor, it constitutes a great tool to understand the algorithmic concepts behind the different data structures, and gives powerful hints in the design of new ones. I will exemplify my view by recollecting what is known on asymptotic analysis of metric indexes, and will add some new results.
Hybrid Index for Metric Space Databases
"... Abstract. We present an index data structure for metricspace databases. The proposed method has the advantage of allowing an efficient use of secondary memory. In the case of index entirely loaded in main memory our strategy achieves competitive performance. Our experimental study shows that the pr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We present an index data structure for metricspace databases. The proposed method has the advantage of allowing an efficient use of secondary memory. In the case of index entirely loaded in main memory our strategy achieves competitive performance. Our experimental study shows that the proposed index outperforms other strategies known to be efficient in practice. A valuable feature of the proposal is that the index can be dynamically updated once constructed. 1