Results 1 -
6 of
6
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high-dimensional, as is the case in m ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high-dimensional, as is the case in many pattern recognition tasks. This, for example, renders the K-NN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against state-of-the-art exact and approximate techniques, both in synthetic and real, metric and non-metric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
The filter-kruskal minimum spanning tree algorithm
, 2009
"... We present Filter-Kruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously ” not in the MST. For arbitrary graphs with random edge weights Filter-Kruskal runs in time O ( m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicat ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present Filter-Kruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously ” not in the MST. For arbitrary graphs with random edge weights Filter-Kruskal runs in time O ( m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicate that the algorithm has very good practical performance over the entire range of edge densities. An equally simple parallelization seems to be the currently best practical algorithm on multicore machines. 1
On sorting, heaps, and minimum spanning trees
- Algorithmica
"... Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtaine ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal expected time for any k. Based on IQS, we present the Quickheap (QH), a simple and efficient priority queue for main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, yet are more cache-friendly. This makes them an excellent alternative for a secondary memory implementation. We show that the expected amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B)log(m/M)) I/O cost with main memory size M and block size B, in a cache-oblivious fashion. As a direct application, we use our techniques to implement classical Minimum Spanning Tree (MST) algorithms. We use IQS to implement Kruskal’s MST algorithm and QHs to implement Prim’s. Experimental results show that IQS, QHs, external QHs, and our Kruskal’s and Prim’s MST variants are competitive, and in many cases better in practice than current state-of-the-art alternative (and much more sophisticated) implementations.
Quickheaps: Simple, Efficient, and Cache-Oblivious ⋆
"... Abstract. We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cache-friendly. This makes them an excellent alternative for a secondary memory i ..."
Abstract
- Add to MetaCart
Abstract. We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cache-friendly. This makes them an excellent alternative for a secondary memory implementation. We show that the average amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B) log(m/M)) I/O cost with block size B, in a cache-oblivious fashion. Our experimental results show that Quickheaps are very competitive with the best alternative external memory heaps. 1
Speeding up Spatial Approximation Search in Metric Spaces
"... Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query proce ..."
Abstract
- Add to MetaCart
Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query processing. Among all the known indices, the baseline for performance for about twenty years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this paper we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60 % of the distance evaluations of AESA in a database of documents, a

