## Cache-oblivious algorithms and data structures (2002)

### Cached

### Download Links

- [www.brics.dk]
- [www.cs.uwaterloo.ca]
- [theory.lcs.mit.edu]
- [db.uwaterloo.ca]
- [www.win.tue.nl]
- [www.win.tue.nl]
- [erikdemaine.org]
- DBLP

### Other Repositories/Bibliography

Venue: | IN LECTURE NOTES FROM THE EEF SUMMER SCHOOL ON MASSIVE DATA SETS |

Citations: | 36 - 3 self |

### BibTeX

@INPROCEEDINGS{Demaine02cache-obliviousalgorithms,

author = {Erik D. Demaine},

title = {Cache-oblivious algorithms and data structures},

booktitle = {IN LECTURE NOTES FROM THE EEF SUMMER SCHOOL ON MASSIVE DATA SETS},

year = {2002},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

A recent direction in the design of cache-efficient and diskefficient algorithms and data structures is the notion of cache obliviousness, introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999. Cache-oblivious algorithms perform well on a multilevel memory hierarchy without knowing any parameters of the hierarchy, only knowing the existence of a hierarchy. Equivalently, a single cache-oblivious algorithm is efficient on all memory hierarchies simultaneously. While such results might seem impossible, a recent body of work has developed cache-oblivious algorithms and data structures that perform as well or nearly as well as standard external-memory structures which require knowledge of the cache/memory size and block transfer size. Here we describe several of these results with the intent of elucidating the techniques behind their design. Perhaps the most exciting of these results are the data structures, which form general building blocks immediately

### Citations

4138 | Computer Architecture: A Quantitative Approach - Hennessy, Patterson - 1990 |

1735 | An introduction to Kolmogorov complexity and its applications
- Li, Vitányi
- 1993
(Show Context)
Citation Context ...N +1)+O(1) = lg N +O(1) bits of information, because it can be any of the N elements or in any of the N + 1 positions between the elements. (The additive O(1) comes from Kolmogorov complexity; 15ssee =-=[LV97]-=-.) Each comparison reveals at most 1 bit of information, proving the the lg N + O(1) lower bound on the number of comparisons. Each block read reveals where the query element fits among those B elemen... |

734 |
Tarjan. Amortized efficiency of list update and paging rules
- Sleator, E
- 1985
(Show Context)
Citation Context ...al replacement up to a constant factor of memory transfers and up a constant factor wastage of the cache. This competitiveness property of LRU and FIFO goes back to a 1985 paper of Sleator and Tarjan =-=[ST85a]-=-. In the algorithmic setting, as long as the number of memory transfers depends polynomially on the cache size M, then halving M will only affect the running time by a constant factor. More generally,... |

551 |
The Input/Output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...external-memory model, the I/O model, the disk access model, or the cache-aware model (to contrast with cache obliviousness). The standard reference for this model is Aggarwal and Vitter’s 1988 paper =-=[AV88]-=- which also analyzes the memory-transfer cost of sorting in this model. Special versions of the model were considered earlier, e.g., by Floyd in 1972 [Flo72] in his analysis of the memory-transfer cos... |

326 | External memory algorithms and data structures: Dealing with massive data
- Vitter
(Show Context)
Citation Context .... 2 Before fetching a block from disk when the cache is already full, the algorithm must decide which block to evict from cache. Many algorithms have been developed in this model; see Vitter’s survey =-=[Vit01]-=-. One of its attractive features, in contrast to a variety of other models, is that the algorithm needs to worry about only two levels of memory, and only two parameters. Naturally, though, the existi... |

201 | Design and implementation of an efficient priority queue, Math. Systems Theory 10 - Boas, Kaas, et al. - 1977 |

173 |
I/O complexity: The red-blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...oblem fits in cache. In fact, it is possible to do better, and achieve a running time of Θ(N 2 /B + N 3 /B √ M). In the external-memory context, this bound was first achieved by Hong and Kung in 1981 =-=[HK81]-=-, who also proved a matching lower bound for any matrix-multiplication algorithm that executes these additions and multiplications (as opposed to Strassen-like algorithms). The cache-oblivious solutio... |

164 | Programming Pearls - Bentley - 2000 |

157 | The buffer tree: a new technique for optimal I/O-algorithms
- Arge
- 1995
(Show Context)
Citation Context ... by inserting into a cache-oblivious B-tree and then extracting the elements, it would cost O(N log B N), which is far greater than the sorting bound. In contrast, Arge’s external-memory buffer trees =-=[Arg95]-=- support insertions and deletions in O( 1 B logM/B N B ) amortized memory transfers, which leads to the desired sorting bound. Buffer trees have the property that queries may be answered later than th... |

147 |
Surpassing the information theoretic bound with fusion trees
- Fredman, Willard
- 1993
(Show Context)
Citation Context ... problems in external memory. For example, what can be said along the lines of self-adjusting data structures such as splay trees [ST85b], van Emde Boas priority queues [vE77,vEKZ77], or fusion trees =-=[FW93]-=-? Acknowledgments Many thanks go to Michael Bender; through many joint discussions, our understanding of cache obliviousness and its associated issues has grown significantly. Also, the realization th... |

140 | Cache-oblivious B-trees
- Bender, Demaine, et al.
- 2000
(Show Context)
Citation Context ...To avoid this problem, we can extend the matrix to the next power of two. So if we have an N × N matrix A, we extend it to an ⌈⌈N⌉⌉ × ⌈⌈N⌉⌉ matrix, where ⌈⌈N⌉⌉ = 2 ⌈lg N⌉ is the hyperceiling operator =-=[BDFC00]-=-. The matrix size N 2 is increased by less than a factor of 4, so the running time increases by only a constant factor. Thus we obtain the following theorem: 12 � .sTheorem 5. For square matrices, the... |

131 |
The Design of Dynamic Data Structures
- Overmars
- 1983
(Show Context)
Citation Context ... rebalance that node. Again every descendent, in particular the original leaf, will then fall within threshold. 20sTo support N changing drastically, we can apply the standard global rebuilding trick =-=[Ove83]-=-: whenever N grows or shrinks by a constant factor, rebuild the entire structure. Analysis. The key property for the amortized analysis is that, when a node is rebalanced, its descendents are not just... |

113 | Emde Boas. Preserving order in a forest in less than logarithmic time and linear - van - 1977 |

105 | An analysis of dag-consistent distributed shared-memory algorithms - Blumofe, Frigo, et al. - 1996 |

97 | Locality of reference in LU decomposition with partial pivoting
- Toledo
- 1997
(Show Context)
Citation Context ...ansfer bound of O(N 2 /B + N lg 7 /B √ M). Other matrix problems can be solved via block recursion. These problems include LU factorization without pivoting [BFJ + 96], LU factorization with pivoting =-=[Tol97]-=-, and matrix transpose and fast Fourier transform [FLPR99,Pro99]. 3.3 Sorting The sorting problem is one of the most-studied problems in computer science. In external-memory algorithms, it plays a par... |

81 | Cache-Oblivious Algorithms - Prokop - 1999 |

74 | A locality-preserving cacheoblivious dynamic dictionary - Bender, Duan, et al. - 2002 |

68 | Cache-oblivious priority queue and graph algorithm applications - Arge, Bender, et al. - 2002 |

65 | Cache oblivious search trees via binary trees of small height
- Brodal, Fagerberg, et al.
- 2002
(Show Context)
Citation Context ...because all nodes have at least Θ(log N) elements below them; this fact is why we have leaves of Θ(log N) elements. 21sDuan, Iacono, and Wu [BDIW02] and simultaneously by Brodal, Fagerberg, and Jacob =-=[BFJ02]-=-. Here we describe the simplification of [BDIW02], because it combines in a fairly simple way two structures we have already described: the static search tree from Section 4.1 and the packed-memory st... |

49 |
A sparse table implementation of priority queues
- Itai, Konheim, et al.
- 1981
(Show Context)
Citation Context ...ts between which the new element belongs; and a delete operation specifies an existing element to remove. Solutions to this ordered-file maintenance problem were pioneered by Itai, Konheim, and Rodeh =-=[IKR81]-=- and Willard [Wil92], and then adapted to the cache-oblivious context in the packed-memory structure of [BDFC00]. 19sFirst attempts. First let us consider two extremes of trivial (inefficient) solutio... |

49 |
Sleator and Robert Endre Tarjan. Self adjusting heaps
- Dominic
- 1986
(Show Context)
Citation Context ...ithms actually give new insight for entering new arenas or solving old problems in external memory. For example, what can be said along the lines of self-adjusting data structures such as splay trees =-=[ST85b]-=-, van Emde Boas priority queues [vE77,vEKZ77], or fusion trees [FW93]? Acknowledgments Many thanks go to Michael Bender; through many joint discussions, our understanding of cache obliviousness and it... |

38 |
Permuting information in idealized two-level storage
- Floyd
- 1972
(Show Context)
Citation Context ...s model is Aggarwal and Vitter’s 1988 paper [AV88] which also analyzes the memory-transfer cost of sorting in this model. Special versions of the model were considered earlier, e.g., by Floyd in 1972 =-=[Flo72]-=- in his analysis of the memory-transfer cost of matrix transposition. The model 1 defines a computer as having two levels (see Figure 1): 1. the cache which is near the CPU, cheap to access, but limit... |

38 |
Sridhar Ramachandran. Cache-oblivious algorithms
- Frigo, Leiserson, et al.
- 1999
(Show Context)
Citation Context ...to moderately complex cache-oblivious algorithms: reversal, matrix operations, and sorting. The bulk of this work culminates with the paper that first defined the notion of cache-oblivious algorithms =-=[FLPR99]-=-. Next in Sections 4 and 5 we examine static and dynamic cache-oblivious data structures that have been developed, mostly in the past few years (2000–2003). Finally, Section 6 summarizes where we are ... |

33 | Scanning and traversing: Maintaining data for traversals in a memory hierarchy
- Bender, Cole, et al.
(Show Context)
Citation Context ... transfers. The packed-memory structure has been further refined to satisfy the property that every update (in addition to every traversal) consists of O(1) physical scans sequentially through memory =-=[BCDFC02]-=-. This property is useful in practice when caches use prefetching to speed up sequential accesses over random blocked accesses. The basic idea is to always grow the rebalancing window to the right, ag... |

32 |
A density control algorithm for doing insertions and deletions in a sequentially ordered in good worst-case time
- Willard
- 1992
(Show Context)
Citation Context ... new element belongs; and a delete operation specifies an existing element to remove. Solutions to this ordered-file maintenance problem were pioneered by Itai, Konheim, and Rodeh [IKR81] and Willard =-=[Wil92]-=-, and then adapted to the cache-oblivious context in the packed-memory structure of [BDFC00]. 19sFirst attempts. First let us consider two extremes of trivial (inefficient) solutions, which give some ... |

23 | A comparison of cache aware and cache oblivious static search trees using program instrumentation, in: Experimental Algorithmics: From Algorithm Design to Robust and Efficient Software - Ladner, Fortna, et al. |

19 | Exponential structures for efficient cache-oblivious algorithms, in
- Bender, Cole, et al.
(Show Context)
Citation Context ...structure supports searches in O(log B N) memory transfers, and insertions and deletions in O((lg N)/B) amortized memory transfers plus the cost of finding the node to update. Bender, Cole, and Raman =-=[BCR02]-=- have strengthened this result in various directions. First, they obtain worst-case bounds of O(log B N) memory transfers for both updates and queries. Second, they build a partially persistent data s... |

11 | I/O-efficient construction of Voronoi diagrams - Kumar, Ramos - 2003 |

7 |
Stolting Brodal and Rolf Fagerberg. Cache oblivious distribution sweeping
- Gerth
- 2002
(Show Context)
Citation Context ...g: a new funnelsort and an adaptation of the existing distribution sort. We will describe a simplification to the first algorithm, called lazy funnelsort, which was introduced by Brodal and Fagerberg =-=[BF02a]-=-. Funnelsort, in turn, is a sort of lazy mergesort. This algorithm will be our first application of the tall-cache assumption (see Section 2.4). For simplicity, we assume that M = Ω(B 2 ). The same re... |

7 |
Stølting Brodal and Rolf Fagerberg. Funnel heap - a cache oblivious priority queue
- Gerth
(Show Context)
Citation Context ...d online delete-min’s in O( 1 B logM/B N B ) amortized memory transfers. The difference with buffer trees is that the cache-oblivious structure does not support delayed searches. Brodal and Fagerberg =-=[BF02b]-=- developed a simpler cacheoblivious priority queue using the funnels and funnelsort algorithm that we saw 23sin Sections 4.2 and 3.3.2. Their data structure (at least as described) does not support de... |

7 | Cache-oblivious algorithms. Master’s thesis, Department of Electrical Engineering and Computer Science at the Massachussets Institute of Technology - Prokop - 1999 |

6 | Robert Endre Tarjan. Time Bounds for Selection - Blum, Floyd, et al. - 1973 |

4 | Cache oblivious data structures - Ohashi - 2000 |

3 |
Stølting Brodal and Rolf Fagerberg. On the limits of cache-obliviousness
- Gerth
- 2003
(Show Context)
Citation Context ...sults can be obtained when M = Ω(B 1+γ ) by increasing the constant 3; refer to [BF02a] for details. Interestingly, optimal cache-oblivious sorting is not achievable without the tall-cache assumption =-=[BF03]-=-. The heart of the funnelsort algorithm is a static data structure which we call a funnel. We delay the description of funnels to Section 4.2 when we have built up some necessary tools in the context ... |

1 | Programming Pearls. Addison-Wesley, Inc., 2nd edition, 2000. [BF02a] [BF02b] Gerth Stølting Brodal and Rolf Fagerberg. Cache oblivious distribution sweeping - Bentley - 2002 |

1 | 2003. To appear - Blumofe, Frigo, et al. |

1 | Experimental Algorithmics: From Algorithm Design to Robust and Efficient - In - 2002 |

1 | An Introduction to Kolmogorov Complexity and [Oha01] its Applications - Li, Vitányi - 1997 |