Results 11  20
of
41
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation
, 2002
"... An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit p ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit pointers perform best overall, but cache oblivious algorithms do almost as well and do not have to be tuned to the memory block size as cache aware algorithms require. Program instrumentation techniques are used to compare the cache misses and instruction counts for implementations of these algorithms.
Cacheoblivious data structures for orthogonal range searching
 IN PROC. ACM SYMPOSIUM ON COMPUTATIONAL GEOMETRY
, 2003
"... We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize data structure that answers ddimensional queries in O((N/B)11/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c.
Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses
"... A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicographical ordering of the keys. A monotone minimal perfect hash function can be seen as a very weak form of index that provides ranking just on the set S (and answers randomly outside of S). Our goal is to minimise the description size of the hash function: we show that, for a set S of n elements out of a universe of 2 w elements, O(n log log w) bits are sufficient to hash monotonically with evaluation time O(log w). Alternatively, we can get space O(n log w) bits with O(1) query time. Both of these data structures improve a straightforward construction with O(n log w) space and O(log w) query time. As a consequence, it is possible to search a sorted table with O(1) accesses to the table (using additional O(n log log w) bits). Our results are based on a structure (of independent interest) that represents a trie in a very compact way, but admits errors. As a further application of the same structure, we show how to compute the predecessor (in the sorted order of S) of an arbitrary element, using O(1) accesses in expectation and an index of O(n log w) bits, improving the trivial result of O(nw) bits. This implies an efficient index for searching a blocked memory.
CacheOblivious Streaming Btrees
, 2007
"... A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optima ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optimal O ` logB+1 N ´ transfers, range queries of L successive elements in optimal O ` logB+1 N + L/B ´ transfers, and insertions in O “ (logB+1 N)/BΘ(1/(loglogB)2 ”) +(log2 N)/B transfers, which is an asymptotic speedup over traditional Btrees if B ≥ (logN) 1+c/logloglog2 N for any constant c> 1. A COLA implements searches in O(logN) transfers, range queries in O(logN + L/B) transfers, and insertions in amortized O((logN)/B) transfers, matching the bounds for a (cacheaware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worstcase insertion cost to O(logN) if memory size M = Ω(logN). We also present a cacheaware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg’s (cacheaware) Bεtree. We compare our COLA implementation to a traditional Btree. Our COLA implementation runs 790 times faster for random insertions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.
The cost of cacheoblivious searching
 IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the diskaccess machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2level) DAM model and cacheoblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal klevel DAM search structure and the optimal cacheoblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cacheoblivious structure almost replicates the performance of an optimal parameterized klevel DAM structure.
CacheOblivious Planar Orthogonal Range Searching and Counting
 In Proc. ACM Symposium on Computational Geometry
, 2005
"... We present the first cacheoblivious data structure for planar orthogonal range counting, and improve on previous results for cacheoblivious planar orthogonal range searching. Our range counting structure uses O(N log2 N) space and answers queries using O(logB N) memory transfers, where B is the bl ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We present the first cacheoblivious data structure for planar orthogonal range counting, and improve on previous results for cacheoblivious planar orthogonal range searching. Our range counting structure uses O(N log2 N) space and answers queries using O(logB N) memory transfers, where B is the block size of any memory level in a multilevel memory hierarchy. Using bit manipulation techniques, the space can be further reduced to O(N). The structure can also be modified to support more general semigroup range sum queries in O(logB N) memory transfers, using O(N log2 N) space for threesided queries and O(N log 2 2 N / log2 log2 N)
Cache oblivious algorithms
 Algorithms for Memory Hierarchies, LNCS 2625
, 2003
"... Abstract. The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems. This model was first formulated in [22] and has since been a topic of intense research. Analyzing and designing algorithms and data st ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems. This model was first formulated in [22] and has since been a topic of intense research. Analyzing and designing algorithms and data structures in this model involves not only an asymptotic analysis of the number of steps executed in terms of the input size, but also the movement of data optimally among the different levels of the memory hierarchy. This chapter is aimed as an introduction to the “idealcache ” model of [22] and techniques used to design cache oblivious algorithms. The chapter also presents some experimental insights and results. Part of this work was done while the author was visiting MPISaarbrücken. The
CacheOblivious RTrees
, 2005
"... We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivio ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivious Rtree with provable performance guarantees. If no point in the plane is contained in B or more rectangles in S, the structure answers a rectangle query using O(\sqrt{N/B} + T/B) memory transfers and a point query using O((N/B)^ε) memory transfers for any ε>0, where B is the block size of memory transfers between any two levels of a multilevel memory hierarchy. We also develop a variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles. The rectangle query bound matches the bound of the best known linearspace cacheaware structure.
I/OEfficient Construction of Voronoi Diagrams
, 2002
"... We consider the problems of computing 2 and 3d Voronoi diagrams for large data sets efficiently. We describe a cacheoblivious distribution data structure (bu#er tree) that is the basis for the cache oblivious implementation of a random incremental construction for geometric problems. We then a ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We consider the problems of computing 2 and 3d Voronoi diagrams for large data sets efficiently. We describe a cacheoblivious distribution data structure (bu#er tree) that is the basis for the cache oblivious implementation of a random incremental construction for geometric problems. We then apply this to the construction of 2 and 3d Voronoi diagrams. We also describe a very simple variant of the standard random incremental construction based on history dag, which has optimal running time and is likely to be I/Oefficient because the pattern of insertions is also local (but we don't have theoretical bounds). Finally, we describe a practical variant that has been implemeted and present some experimental results.