Results 1 
9 of
9
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in m ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in many pattern recognition tasks. This, for example, renders the KNN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against stateoftheart exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
The filterkruskal minimum spanning tree algorithm
, 2009
"... We present FilterKruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously” not in the MST. For arbitrary graphs with random edge weights FilterKruskal runs in time O (m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicate ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We present FilterKruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously” not in the MST. For arbitrary graphs with random edge weights FilterKruskal runs in time O (m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicate that the algorithm has very good practical performance over the entire range of edge densities. An equally simple parallelization seems to be the currently best practical algorithm on multicore machines.
Pivot selection strategies for permutationbased similarity search
 Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science
, 2013
"... Abstract. Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searching for data objects whose permutation representation is similar to that of the query. This, of course assumes that similar objects are represented by similar permutations of the pivots. In the context of permutationbased indexing, most authors propose to select pivots randomly from the data set, given that traditional pivot selection strategies do not reveal better performance. However, to the best of our knowledge, no rigorous comparison has been performed yet. In this paper we compare five pivots selection strategies on three permutationbased similarity access methods. Among those, we propose a novel strategy specifically designed for permutations. Two significant observations emerge from our tests. First, random selection is always outperformed by at least one of the tested strategies. Second, there is not a strategy that is universally the best for all permutationbased access methods; rather different strategies are optimal for different methods.
On sorting, heaps, and minimum spanning trees
 Algorithmica
"... Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtaine ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal expected time for any k. Based on IQS, we present the Quickheap (QH), a simple and efficient priority queue for main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, yet are more cachefriendly. This makes them an excellent alternative for a secondary memory implementation. We show that the expected amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B)log(m/M)) I/O cost with main memory size M and block size B, in a cacheoblivious fashion. As a direct application, we use our techniques to implement classical Minimum Spanning Tree (MST) algorithms. We use IQS to implement Kruskal’s MST algorithm and QHs to implement Prim’s. Experimental results show that IQS, QHs, external QHs, and our Kruskal’s and Prim’s MST variants are competitive, and in many cases better in practice than current stateoftheart alternative (and much more sophisticated) implementations.
Speeding up Spatial Approximation Search in Metric Spaces
"... Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query proce ..."
Abstract
 Add to MetaCart
Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query processing. Among all the known indices, the baseline for performance for about twenty years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this paper we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESAlike algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60 % of the distance evaluations of AESA in a database of documents, a
Algorithms, Performance
"... This paper proposes a strategy to reduce the amount of hardware involved in the solution of search engine queries. It proposes using a secondary compact cache that keeps minimal information stored in the query receptionist machine to register the processors that must get involved in the solution of ..."
Abstract
 Add to MetaCart
(Show Context)
This paper proposes a strategy to reduce the amount of hardware involved in the solution of search engine queries. It proposes using a secondary compact cache that keeps minimal information stored in the query receptionist machine to register the processors that must get involved in the solution of queries which are evicted from the standard result cache or are not admitted in it. This cache strategy produces exact answers by using very few processors.
Quickheaps: Simple, Efficient, and CacheOblivious
, 2008
"... We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cachefriendly. This makes them an excellent alternative for a secondary memory implementa ..."
Abstract
 Add to MetaCart
(Show Context)
We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cachefriendly. This makes them an excellent alternative for a secondary memory implementation. We show that the average amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B) log(m/M)) I/O cost with block size B, in a cacheoblivious fashion. Our experimental results show that Quickheaps are very competitive with the best alternative external memory heaps.
A Comparison of Pivot Selection Techniques for PermutationBased Indexing
"... Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searchi ..."
Abstract
 Add to MetaCart
(Show Context)
Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searching for data objects whose permutation representation is similar to that of the query, following the assumption that similar objects are represented by similar permutations of the pivots. In the context of permutationbased indexing, most authors propose to select pivots randomly from the data set, given that traditional pivot selection techniques do not reveal better performance. However, to the best of our knowledge, no rigorous comparison has been performed yet. In this paper we compare five pivot selection techniques on three permutationbased similarity access methods. Among those, we propose a novel technique specifically designed for permutations. Two significant observations emerge from our tests. First, random selection is always outperformed by at least one of the tested techniques. Second, there is not a technique that is universally the best for all permutationbased access methods; rather different techniques are optimal for different methods. This indicates that the pivot selection technique should be considered as an integrating and relevant part of any permutationbased access method.