Cacheoblivious Btrees
, 2000
"... Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of me ..."
Abstract

Cited by 139 (22 self)
Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of memory levels. The performance is analyzed in terms of the number of memory transfers between two memory levels with an arbitrary blocktransfer size of B; this analysis can then be applied to every adjacent pair of levels in a multilevel memory hierarchy. Both search trees match the optimal search bound of Θ(1+logB+1 N) memory transfers. This bound is also achieved by the classic Btree data structure on a twolevel memory hierarchy with a known blocktransfer size B. The first search tree supports insertions and deletions in Θ(1 + logB+1 N) amortized memory transfers, which matches the Btree’s worstcase bounds. The second search tree supports scanning S consecutive elements optimally in Θ(1 + S/B) memory transfers and supports insertions and deletions in Θ(1 + logB+1 N + log2 N) amortized memory transfers, matching the performance of the Btree for B = B Ω(log N log log N).
Cacheoblivious algorithms and data structures
 IN LECTURE NOTES FROM THE EEF SUMMER SCHOOL ON MASSIVE DATA SETS
, 2002
"... A recent direction in the design of cacheefficient and diskefficient algorithms and data structures is the notion of cache obliviousness, introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999. Cacheoblivious algorithms perform well on a multilevel memory hierarchy without knowing any pa ..."
Abstract

Cited by 35 (3 self)
A recent direction in the design of cacheefficient and diskefficient algorithms and data structures is the notion of cache obliviousness, introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999. Cacheoblivious algorithms perform well on a multilevel memory hierarchy without knowing any parameters of the hierarchy, only knowing the existence of a hierarchy. Equivalently, a single cacheoblivious algorithm is efficient on all memory hierarchies simultaneously. While such results might seem impossible, a recent body of work has developed cacheoblivious algorithms and data structures that perform as well or nearly as well as standard externalmemory structures which require knowledge of the cache/memory size and block transfer size. Here we describe several of these results with the intent of elucidating the techniques behind their design. Perhaps the most exciting of these results are the data structures, which form general building blocks immediately
An empirical study of cacheoblivious priority queues and their application to the shortest path problem.
, 2008
"... Abstract. In recent years the CacheOblivious model of external memory computation has provided an attractive theoretical basis for the analysis of algorithms on massive datasets. Much progress has been made in discovering algorithms that are asymptotically optimal or near optimal. However, to date ..."
Abstract

Cited by 2 (0 self)
Abstract. In recent years the CacheOblivious model of external memory computation has provided an attractive theoretical basis for the analysis of algorithms on massive datasets. Much progress has been made in discovering algorithms that are asymptotically optimal or near optimal. However, to date there are still relatively few successful experimental studies. In this paper we compare two different CacheOblivious priority queues based on the Funnel and Bucket Heap and apply them to the single source shortest path problem on graphs with positive edge weights. Our results show that when RAM is limited and data is swapping to external storage, the CacheOblivious priority queues achieve orders of magnitude speedups over standard internal memory techniques. However, for the single source shortest path problem both on simulated and real world graph data, these speedups are markedly lower due to the time required to access the graph adjacency list itself. 1
Barcelona Aarhus Barcelona
, 2002
"... This is the second annual progress report for the ALCOMFT project, supported by the European ..."
Abstract
This is the second annual progress report for the ALCOMFT project, supported by the European
2 ExternalMemory Sorting
, 2003
"... This lecture will cover sorting in the cacheoblivious world. Sorting seems like an unusual topic for a data structures course, but as we will see, the results of our discussion of cacheoblivious sorting will lead to our development of cacheoblivious priority queues. We first review externalmemor ..."
Abstract
This lecture will cover sorting in the cacheoblivious world. Sorting seems like an unusual topic for a data structures course, but as we will see, the results of our discussion of cacheoblivious sorting will lead to our development of cacheoblivious priority queues. We first review externalmemory sorting before moving on to cacheoblivious sorting, priority queues, and Funnel Heaps. 1.1 Notation This lecture uses capital letters in our analysis. We choose this notation because some papers use the notation n = N M B and m = B, where N is the number of elements, M is the size of the cache, and B is the block size. This notation is confusing so we will avoid it.
Flattening
, 2005
"... This thesis studies three problems in the field of parallel computing. The first result provides a deterministic parallel sorting algorithm that empirically shows an improvement over two sample sort algorithms. When using a comparison sort, this algorithm is 1optimal in both computation and communi ..."
Abstract
This thesis studies three problems in the field of parallel computing. The first result provides a deterministic parallel sorting algorithm that empirically shows an improvement over two sample sort algorithms. When using a comparison sort, this algorithm is 1optimal in both computation and communication. The second study develops some extensions to the StarP system [7, 6] that allows it to solve more real problems. The timings provided indicate the scalability of the implementations on some systems. The third problem concerns automatic parallelization. By representing a computation as a binary tree, which we assume is given, it can be shown that the height corresponds to the parallel execution time, given enough processors. The main result of the chapter is an algorithm that uses tree rotations to reduce the height of an arbitrary binary tree to become logarithmic in the number of its inputs. This method can solve more general problems as the definition of tree rotation is slightly altered; examples arec given that derive the parallel prefix algorithm, and give a speedup in