Results 1  10
of
18
Cacheoblivious Btrees
, 2000
"... Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of me ..."
Abstract

Cited by 133 (22 self)
 Add to MetaCart
Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of memory levels. The performance is analyzed in terms of the number of memory transfers between two memory levels with an arbitrary blocktransfer size of B; this analysis can then be applied to every adjacent pair of levels in a multilevel memory hierarchy. Both search trees match the optimal search bound of Θ(1+logB+1 N) memory transfers. This bound is also achieved by the classic Btree data structure on a twolevel memory hierarchy with a known blocktransfer size B. The first search tree supports insertions and deletions in Θ(1 + logB+1 N) amortized memory transfers, which matches the Btree’s worstcase bounds. The second search tree supports scanning S consecutive elements optimally in Θ(1 + S/B) memory transfers and supports insertions and deletions in Θ(1 + logB+1 N + log2 N) amortized memory transfers, matching the performance of the Btree for B = B Ω(log N log log N).
Cacheoblivious priority queue and graph algorithm applications
 In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract

Cited by 64 (10 self)
 Add to MetaCart
In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed externalmemory (cacheaware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known externalmemory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
Cache Oblivious Search Trees via Binary Trees of Small Height
 In Proc. ACMSIAM Symp. on Discrete Algorithms
, 2002
"... We propose a version of cache oblivious search trees which is simpler than the previous proposal of Bender, Demaine and FarachColton and has the same complexity bounds. In particular, our data structure avoids the use of weight balanced Btrees, and can be implemented as just a single array of ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We propose a version of cache oblivious search trees which is simpler than the previous proposal of Bender, Demaine and FarachColton and has the same complexity bounds. In particular, our data structure avoids the use of weight balanced Btrees, and can be implemented as just a single array of data elements, without the use of pointers. The structure also improves space utilization.
On the limits of cacheobliviousness
 IN PROC. 35TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2003
"... In this paper, we present lower bounds for permuting and sorting in the cacheoblivious model. We prove that (1) I/O optimal cacheoblivious comparison based sorting is not possible without a tall cache assumption, and (2) there does not exist an I/O optimalcacheoblivious algorithm for permuting, ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
In this paper, we present lower bounds for permuting and sorting in the cacheoblivious model. We prove that (1) I/O optimal cacheoblivious comparison based sorting is not possible without a tall cache assumption, and (2) there does not exist an I/O optimalcacheoblivious algorithm for permuting, not even in the presence of a tall cache assumption.Our results for sorting show the existence of an inherent tradeoff in the cacheoblivious model between the strength of the tall cache assumption and the overhead for the case M >> B, and show that Funnelsort and recursive binary mergesort are optimal algorithms in the sense that they attain this tradeoff.
Efficient tree layout in a multilevel memory hierarchy, arXiv:cs.DS/0211010
, 2003
"... We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a roottoleaf path, subject to a given probability distribution on the leaves. This problem was previ ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a roottoleaf path, subject to a given probability distribution on the leaves. This problem was previously considered by Gil and Itai, who developed optimal but slow algorithms when the blocktransfer size B is known. We present faster but approximate algorithms for the same problem; the fastest such algorithm runs in linear time and produces a solution that is within an additive constant of optimal. In addition, we show how to extend any approximately optimal algorithm to the cacheoblivious setting in which the blocktransfer size is unknown to the algorithm. The query performance of the cacheoblivious layout is within a constant factor of the query performance of the optimal knownblocksize layout. Computing the cacheoblivious layout requires only logarithmically many calls to the layout algorithm for known block size; in particular, the cacheoblivious layout can be computed in O(N lg N) time, where N is the number of nodes. Finally, we analyze two greedy strategies, and show that they have a performance ratio between Ω(lg B / lg lg B) and O(lg B) when compared to the optimal layout.
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
Cacheoblivious data structures for orthogonal range searching
 IN PROC. ACM SYMPOSIUM ON COMPUTATIONAL GEOMETRY
, 2003
"... We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize data structure that answers ddimensional queries in O((N/B)11/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c.
Exponential structures for efficient cacheoblivious algorithms
 In Proceedings of the 29th International Colloquium on Automata, Languages and Programming
, 2002
"... Abstract. We present cacheoblivious data structures based upon exponential structures. These data structures perform well on a hierarchical memory but do not depend on any parameters of the hierarchy, including the block sizes and number of blocks at each level. The problems we consider are searchi ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Abstract. We present cacheoblivious data structures based upon exponential structures. These data structures perform well on a hierarchical memory but do not depend on any parameters of the hierarchy, including the block sizes and number of blocks at each level. The problems we consider are searching, partial persistence and planar point location. On a hierarchical memory where data is transferred in blocks of size B, some of the results we achieve are: – We give a linearspace data structure for dynamic searching that supports searches and updates in optimal O(log B N) worstcase I/Os, eliminating amortization from the result of Bender, Demaine, and FarachColton (FOCS ’00). We also consider finger searches and updates and batched searches. – We support partiallypersistent operations on an ordered set, namely, we allow searches in any previous version of the set and updates to the latest version of the set (an update creates a new version of the set). All operations take an optimal O(log B (m + N)) amortized I/Os, where N is the size of the version being searched/updated, and m is the number of versions. – We solve the planar point location problem in linear space, taking optimal O(log B N) I/Os for point location queries, where N is the number of line segments specifying the partition of the plane. The preprocessing requires O((N/B) log M/B N) I/Os, where M is the size of the ‘inner ’ memory. 1
The cost of cacheoblivious searching
 IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the diskaccess machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2level) DAM model and cacheoblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal klevel DAM search structure and the optimal cacheoblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cacheoblivious structure almost replicates the performance of an optimal parameterized klevel DAM structure.