## Cache-oblivious algorithms and data structures (2004)

Venue: | IN SWAT |

Citations: | 8 - 1 self |

### BibTeX

@INPROCEEDINGS{Brodal04cache-obliviousalgorithms,

author = {Gerth Stølting Brodal},

title = {Cache-oblivious algorithms and data structures},

booktitle = {IN SWAT},

year = {2004},

publisher = {}

}

### OpenURL

### Abstract

Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the two-level I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal off-line cache replacement strategy. The result are algorithms that automatically apply to multi-level memory hierarchies. This paper gives an overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al.

### Citations

8542 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ... the other processes being scheduled. 1.2 Ideal-cache model Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model and coined the terminology of cache-oblivious algorithms =-=[44]-=-. The idealcache model can be viewed as a formal framework for analyzing the locality of reference of an algorithm that is oblivious about the presence of the memory hierarchy. The basic idea is to de... |

3992 |
Computer Architecture: A Quantitative Approach
- Hennessy, Patterson
- 2007
(Show Context)
Citation Context ...rigo et al. 1 Introduction Modern computers are characterized by having a memory system consisting of a hierarchy of several levels of memory, where each level is acting as a cache for the next level =-=[46]-=-. The typical memory levels of current machines are registers, level 1 cache, level 2 cache, level 3 cache, main memory, and disk. While the sizes of the levels increase with the distance from the CPU... |

1124 |
Multidimensional binary search trees used for associative searching
- Bentley
- 1975
(Show Context)
Citation Context ...B N) I/Os. Cache-oblivious algorithms for orthogonal range searching were presented in [2], both a kd-tree and range-tree solution were presented. A cache-oblivious kd-tree is simply a normal kd-tree =-=[24]-=- laid out in memory using the van Emde Boas layout. This structure uses linear space and answers queries in O( √ N/B+ K B ) I/Os; this is optimal among linear space structures [49]. Insertions are fac... |

727 |
Tarjan. Amortized efficiency of list update and paging rules
- Sleator, E
- 1985
(Show Context)
Citation Context ...an optimal off-line cache replacement strategy can be replaced by the on-line least-recently used (LRU) cache replacement strategy,by appealing to Sleator and Tarjan’s classic competitiveness result =-=[64]-=- for LRUpaging. Since LRU is adaptive to dynamically changing memory sizes, cache oblivious algorithms are also adaptive to changes in the available memory. A naive cache-oblivious algorithm is the sc... |

374 |
Gaussian elimination is not optimal
- Strassen
- 1969
(Show Context)
Citation Context ...wer bound by Hong and Kung [47] for algorithms computing the matrix product only using additions and multiplications. In [44] it was furthermore proved that Strassen’s matrix multiplication algorithm =-=[65]-=- is cache-oblivious and requires O(n + n 2 /B + n log 2 7 /(B √ M)) I/Os. Optimal comparison based sorting algorithms performing O(Sort(N)) I/Os were presented, under the so called tall cache assumpti... |

371 | Time bounds for selection
- Blum, Floyd, et al.
- 1973
(Show Context)
Citation Context ...ptive to changes in the available memory. A naive cache-oblivious algorithm is the scanning of an N element array that requires optimal Θ(N/B) I/Os. The linear time selection algorithm of Blum et al. =-=[27]-=- primarily is based on scanning and it can be proved that their selection algorithm is an optimal cache-oblivious algorithm performing Θ(N/B) I/Os. Frigo et al. in their seminal paper [44] considered ... |

320 | External Memory Algorithms and Data Structures: Dealing with Massive Data
- Vitter
- 1981
(Show Context)
Citation Context ...etween two levels of the memory hierarchy dominates the running time. For an overview of the comprehensive work done related to the I/O model we refer the reader to the surveys by Arge [9] and Vitter =-=[69]-=-, and the book [57]. More sophisticated multi level models have been studied in the literature [3–5, 7,16,47,62,63,70,71], but none of these have gained the same level of attention as the I/O model of... |

254 |
Organization and maintenance of large ordered indexes
- Bayer, McCreight
- 1972
(Show Context)
Citation Context ...parison based sorting requires Θ(SortM,B(N)) = Θ( N ) I/Os, which is achieved by Theta( M B logM/B N B B )-ary multi-way mergesort, and searching requires Θ(logB N) I/Os, which is acheived by B-trees =-=[17]-=-. The success of the I/O model is likely due to its simplicity making the design and analysis of external memory algorithms feasible, while adequately modeling the case where the I/Os between two leve... |

236 | Algorithms for parallel memory I: Two-level memories - Vitter, Shriver - 1994 |

200 |
Design and implementation of an efficient priority queue
- Boas, Kaas, et al.
- 1977
(Show Context)
Citation Context ...e-oblivious search trees with search cost O(log B N) I/Os, matching the search cost of standard (cache-aware) B-trees [17]. The search trees of Prokop are related to a data structure of van Emde Boas =-=[67, 68]-=-, since the recursive layout of a search tree generated by Prokop’s scheme resembles the layout of the search trees of van Emde Boas. The constant in the O(log B N) search cost was studied in [21], wh... |

173 | External-memory graph algorithms
- Chiang
- 1995
(Show Context)
Citation Context ...omputing the Euler tour of a tree, breadth first search (BFS) of a tree, and depth first search (DFS) of a tree, all requiring O(Sort(E)) I/Os, matching the known bounds for the I/O model achieved in =-=[36]-=-. For directed BFS and DFS on general graphs a cache-oblivious algorithm was presented performing O((V + E/B)logV +Sort(E)) I/Os, matching the known best bounds for the I/O model [34]. For undirected ... |

166 |
H.T.: I/O complexity: The red-blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...ptimal O(mn/B) I/Os. The multiplication of an m × n-matrix and an n × p-matrix was solved using O((mn + np + mp)/B + mnp/(B √ M)) I/Os. For square matrices this matches a lower bound by Hong and Kung =-=[47]-=- for algorithms computing the matrix product only using additions and multiplications. In [44] it was furthermore proved that Strassen’s matrix multiplication algorithm [65] is cache-oblivious and req... |

128 | A model for hierarchical memory - Aggarwal, Alpern, et al. - 1987 |

121 | External-memory computational geometry
- Goodrich, Tsay, et al.
- 1993
(Show Context)
Citation Context ...single source shortest path (SSSP) can be solved cache-obliviously in O(V +E/B log(E/B)) I/Os [32,37], matching the known bounds for the I/O model [51]. V B E7 Computational geometry Goodrich et al. =-=[45]-=- introduced the distribution sweeping approach to solve a sequence of problems within computational geometry in the I/O model. A cacheoblivious version of the distribution sweeping approach is develop... |

113 | The influence of caches on the performance of sorting
- LaMarca
- 1997
(Show Context)
Citation Context ...ts optimized for the memory hierarchy—see e.g. the paper by Chatterjee et al. [35] and the references it contains.Ladner et al. considered the effect of caches in connection with heaps [54], sorting =-=[55]-=-, and sequential and random traversals [52]. Using registers to improve the running time of sorting was considered in [14]. Minimizing translation look-aside buffer (TLB) misses, and the case of low c... |

112 | The uniform memory hierarchy model of computation. Algorithmica - Alpern, Carter, et al. - 1994 |

112 |
Emde Boas. Preserving order in a forest in less than logarithmic time and linear space
- van
- 1978
(Show Context)
Citation Context ...e-oblivious search trees with search cost O(log B N) I/Os, matching the search cost of standard (cache-aware) B-trees [17]. The search trees of Prokop are related to a data structure of van Emde Boas =-=[67, 68]-=-, since the recursive layout of a search tree generated by Prokop’s scheme resembles the layout of the search trees of van Emde Boas. The constant in the O(log B N) search cost was studied in [21], wh... |

110 | Hierarchical memory with block transfer - Aggarwal, Chandra, et al. - 1987 |

95 | Locality of Reference in LU Decomposition with partial pivoting
- Toledo
- 1997
(Show Context)
Citation Context ...ng algorithms were presented. Finally an algorithm for fast Fourier transform (FFT) was presented requiring O(Sort(N)) I/Os. A cache-oblivious algorithm for LU decomposition with pivoting appeared in =-=[66]-=-. The remaining of this paper gives an overview of the results on cacheoblivious algorithms and data structures achieved during the five years since the seminal paper by Frigo et al. Recent surveys on... |

88 | A functional approach to external graph algorithms
- Abello, Buchsbaum, et al.
- 1998
(Show Context)
Citation Context ...for the I/O model in [58]. Finally an O(Sort(E)log log V ) I/O minimum spanning tree algorithm was presented, nearly matching the O(Sort(E)log log ) I/O bound in [12] for the I/O model. Abello et al. =-=[1]-=- presented for the I/O model a functional approach to solve a sequence of graph problems based on recursion and repeated use of sorting and scanning. Their randomized minimum spanning tree algorithm i... |

80 |
Decomposable searching problems
- Bentley
- 1979
(Show Context)
Citation Context ...ructure uses linear space and answers queries in O( √ N/B+ K B ) I/Os; this is optimal among linear space structures [49]. Insertions are facilitated using the so-called logarithmic method of Bentley =-=[25]-=-, and require log N O( B logM/B N) I/Os. The cache-oblivious range-tree presented in [2] supports range queries in O(logB N + K B ) I/Os and requires space O(N log2 N). 8 Lower bounds A general reduct... |

79 | External memory data structures
- Arge
- 2002
(Show Context)
Citation Context ...here the I/Os between two levels of the memory hierarchy dominates the running time. For an overview of the comprehensive work done related to the I/O model we refer the reader to the surveys by Arge =-=[9]-=- and Vitter [69], and the book [57]. More sophisticated multi level models have been studied in the literature [3–5, 7,16,47,62,63,70,71], but none of these have gained the same level of attention as ... |

78 | Cache-oblivious algorithms
- Prokop
- 1999
(Show Context)
Citation Context ... property that every update (in addition to every traversal) consists of O(1) physical scans sequentially through memory. Updates still require amortized O((log 2 N)/B) I/Os. 4 Search trees Prokop in =-=[60]-=- proposed static cache-oblivious search trees with search cost O(log B N) I/Os, matching the search cost of standard (cache-aware) B-trees [17]. The search trees of Prokop are related to a data struct... |

75 | Improved algorithms and data structures for solving graph problems in external memory
- Kumar, Schwabe
- 1996
(Show Context)
Citation Context ... +√V E/B · √ V B/E ε ) I/Os respectively. Undirected single source shortest path (SSSP) can be solved cache-obliviously in O(V +E/B log(E/B)) I/Os [32,37], matching the known bounds for the I/O model =-=[51]-=-. V B E7 Computational geometry Goodrich et al. [45] introduced the distribution sweeping approach to solve a sequence of problems within computational geometry in the I/O model. A cacheoblivious ver... |

73 | M.: Nonlinear array layouts for hierarchical memory systems
- Chatterjee, Jain, et al.
- 1999
(Show Context)
Citation Context ...een studied before in different contexts. In connection with matrices, significant speedups can be achieved by using layouts optimized for the memory hierarchy—see e.g. the paper by Chatterjee et al. =-=[35]-=- and the references it contains.Ladner et al. considered the effect of caches in connection with heaps [54], sorting [55], and sequential and random traversals [52]. Using registers to improve the ru... |

72 | A locality-preserving cache-oblivious dynamic dictionary
- Bender, Duan, et al.
- 2002
(Show Context)
Citation Context ...31,53,59]. Dynamic B-trees were first presented by Bender et al. [22] achieving searches in O(log B N) I/Os and updates requiring amortized O(log B N) I/Os. Simplified constructions were presented in =-=[23]-=- and [31], where [31] is based on combining the recursive static layout of Prokop [60] and the dynamic search trees of low height by Andersson and Lai [8], and [23] is based on combining the static la... |

69 | I/O complexity of graph algorithms
- Munagala
- 1999
(Show Context)
Citation Context ...+ E/B)logV +Sort(E)) I/Os, matching the known best bounds for the I/O model [34]. For undirected DFS, an algorithm performing O(V + Sort(E)) I/Os was achieved, matching the bound for the I/O model in =-=[58]-=-. Finally an O(Sort(E)log log V ) I/O minimum spanning tree algorithm was presented, nearly matching the O(Sort(E)log log ) I/O bound in [12] for the I/O model. Abello et al. [1] presented for the I/O... |

68 | The influence of caches on the performance of heaps
- LaMarca, Ladner
- 1996
(Show Context)
Citation Context ...by using layouts optimized for the memory hierarchy—see e.g. the paper by Chatterjee et al. [35] and the references it contains.Ladner et al. considered the effect of caches in connection with heaps =-=[54]-=-, sorting [55], and sequential and random traversals [52]. Using registers to improve the running time of sorting was considered in [14]. Minimizing translation look-aside buffer (TLB) misses, and the... |

66 | Algorithms for parallel memory II: Hierarchical multilevel memories - Vitter, Shriver - 1994 |

64 | Cache-oblivious priority queue and graph algorithm applications
- Arge, Bender, et al.
- 2002
(Show Context)
Citation Context ... of a given element. The layout of arbitrary static trees was considered in [20]. Finally, optimal cache-oblivious implicit dictionaries were developed in [42] and [43]. 5 Priority queues Arge et al. =-=[11]-=- presented the first cache-oblivious priority, supporting inserts and delete-min operations in O( 1 B logM/B N B ) I/Os. This matches the performance achieved in the I/O model by e.g. the buffer trees... |

63 | Cache oblivious search trees via binary trees of small height (extended abstract
- Brodal, Fagerberg, et al.
- 2002
(Show Context)
Citation Context ...m can achieve a performance better than log 2 e · log B N I/Os, i.e. a factor ≈ 1.44 slower than a cache-aware algorithm. Cache oblivious search trees avoiding the usage of pointers were presented in =-=[31,53,59]-=-. Dynamic B-trees were first presented by Bender et al. [22] achieving searches in O(log B N) I/Os and updates requiring amortized O(log B N) I/Os. Simplified constructions were presented in [23] and ... |

58 | On external memory graph traversal
- Buchsbaum, Goldwasser, et al.
(Show Context)
Citation Context ...odel achieved in [36]. For directed BFS and DFS on general graphs a cache-oblivious algorithm was presented performing O((V + E/B)logV +Sort(E)) I/Os, matching the known best bounds for the I/O model =-=[34]-=-. For undirected DFS, an algorithm performing O(V + Sort(E)) I/Os was achieved, matching the bound for the I/O model in [58]. Finally an O(Sort(E)log log V ) I/O minimum spanning tree algorithm was pr... |

55 |
The buffer tree: A technique for designing batched external data structures
- Arge
(Show Context)
Citation Context ...d the first cache-oblivious priority, supporting inserts and delete-min operations in O( 1 B logM/B N B ) I/Os. This matches the performance achieved in the I/O model by e.g. the buffer trees of Arge =-=[10]-=-. The construction in [11] is a general reduction to sorting. An alternative cache-oblivious priority achieving the same I/O complexity as [11] was presented in [29]. This solution is a more direct so... |

48 | Towards a theory of cache-efficient algorithms - Sen, Chatterjee - 2000 |

47 |
A sparse table implementation of priority queues
- Itai, Konheim, et al.
- 1981
(Show Context)
Citation Context ... in [41], i.e. an algorithm that works with a single array of size N only storing the N input elements plus O(1) machine words. Sorting multisets has been studied in [40]. 3 List labeling Itai et al. =-=[48]-=- studied the problem of maintaining N elements in sorted order in an array of length O(N), an important problem in dynamic dictionaries when an efficient range query operation is required to be suppor... |

40 | Cache oblivious distribution sweeping, in
- Brodal, Fagerberg
(Show Context)
Citation Context ... based on the merging paradigm, Funnelsort, and one based on the distribution paradigm. Both algorithms require the tall cache assumption M ≥ B 2 . A simplified version of Funnelsort was presented in =-=[28]-=-, denoted Lazy Funnelsort, requiring the tall cache assumption M ≥ B 1+ε . An empirical study of the developed cache-oblivious sorting algorithms is presented in [33]. That I/O optimal cache-oblivious... |

39 | On the limits of cache-obliviousness, in
- Brodal, Fagerberg
- 2003
(Show Context)
Citation Context ... study of the developed cache-oblivious sorting algorithms is presented in [33]. That I/O optimal cache-oblivious comparison based sorting is not possible without a tall cache assumption is proved in =-=[30]-=-. The paper shows an inherent trade-off for cache-oblivious algorithms between the strength of the tall cache assumption and the overhead for the case M ≫ B. The result implies that both Funnelsort an... |

38 | Extending the Hong-Kung model to memory hierarchies - Savage - 1995 |

34 | R.: Funnel heap - a cache oblivious priority queue
- Brodal, Fagerberg
(Show Context)
Citation Context ...l by e.g. the buffer trees of Arge [10]. The construction in [11] is a general reduction to sorting. An alternative cache-oblivious priority achieving the same I/O complexity as [11] was presented in =-=[29]-=-. This solution is a more direct solution based on k-mergers introduced in the Funnelsort algorithm [44,28]. 6 Graph algorithms The existence of a cache-oblivious priority queue enabled a sequence of ... |

32 |
A density control algorithm for doing insertions and deletions in a sequentially ordered in good worst-case time
- Willard
- 1992
(Show Context)
Citation Context ...s amortized O(log 2 N) work per update. A matching Ω(log 2 N) lower bound for algorithms using even redistribution as the primitive was given in [39]. A worst-case variant was developed by Willard in =-=[72]-=-. Bender et al. [22] adapted the algorithms to the cache oblivious setting, supporting insertions and deletions in the array in amortized O((log 2 N)/B) I/Os, and guaranteeing that there are only O(1)... |

31 | A general lower bound on the I/O-complexity of comparison-based algorithms
- Arge, Knudsen, et al.
- 1993
(Show Context)
Citation Context ...eries in O(logB N + K B ) I/Os and requires space O(N log2 N). 8 Lower bounds A general reduction technique for proving lower bounds for comparison based algorithms for the I/O model was presented in =-=[15]-=-, allowing the reduction to standard comparison trees. Lower bounds achieved for the I/O model immediately apply to cache-oblivious algorithms also. Bilardi and Peserico [26] have investigated the por... |

31 | Scanning and traversing: maintaining data for traversals in a memory hierarchy
- Bender, Cole, et al.
(Show Context)
Citation Context ..., supporting insertions and deletions in the array in amortized O((log 2 N)/B) I/Os, and guaranteeing that there are only O(1) empty slots between two consecutive elements in the array. Bender et al. =-=[18]-=- refined the last labeling solution to satisfy the property that every update (in addition to every traversal) consists of O(1) physical scans sequentially through memory. Updates still require amorti... |

31 |
Optimized predecessor data structures for internal memory
- Rahman, Cole, et al.
- 2001
(Show Context)
Citation Context ...isters to improve the running time of sorting was considered in [14]. Minimizing translation look-aside buffer (TLB) misses, and the case of low cache associativity was studied in [73]. Rahman et al. =-=[61]-=- made an empirical study of the performance of various search tree implementations, with focus on showing the significance of minimizing TLB misses. Brodal et al. [31] studied different memory layouts... |

30 | Efficient tree layout in a multilevel memory hierarchy
- Bender, Demaine, et al.
(Show Context)
Citation Context ...ous versions of the search tree, and how to support efficient cache-oblivious finger searches, i.e. searches in the vicinity of a given element. The layout of arbitrary static trees was considered in =-=[20]-=-. Finally, optimal cache-oblivious implicit dictionaries were developed in [42] and [43]. 5 Priority queues Arge et al. [11] presented the first cache-oblivious priority, supporting inserts and delete... |

26 | Optimal dynamic range searching in non-replicating index structures
- Kanth, Singh
- 1999
(Show Context)
Citation Context ...ly a normal kd-tree [24] laid out in memory using the van Emde Boas layout. This structure uses linear space and answers queries in O( √ N/B+ K B ) I/Os; this is optimal among linear space structures =-=[49]-=-. Insertions are facilitated using the so-called logarithmic method of Bentley [25], and require log N O( B logM/B N) I/Os. The cache-oblivious range-tree presented in [2] supports range queries in O(... |

25 | Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths
- GS, Fagerberg, et al.
(Show Context)
Citation Context ...ort(E)) I/Os. √ In [56] it was shown how to solve undirected BFS in O(ST(E) + Sort(E) + V E/B) I/Os for the I/O model, where ST(E) denotes the I/O bound for computing a spanning tree of the graph. In =-=[32]-=- two cache-oblivious versions of the algorithm in [56] were developed requiring O(ST(E) + Sort(E) + E B log V + √ E 1 V E/B) and O(ST(E)+Sort(E)+ B · ε ·log log V +√V E/B · √ V B/E ε ) I/Os respective... |

25 | Improving memory performance of sorting algorithms
- Xiao, Zhang, et al.
- 1981
(Show Context)
Citation Context ...sals [52]. Using registers to improve the running time of sorting was considered in [14]. Minimizing translation look-aside buffer (TLB) misses, and the case of low cache associativity was studied in =-=[73]-=-. Rahman et al. [61] made an empirical study of the performance of various search tree implementations, with focus on showing the significance of minimizing TLB misses. Brodal et al. [31] studied diff... |

24 | On external-memory MST, SSSP and multi-way planar graph separation
- Arge, Brodal, et al.
- 2000
(Show Context)
Citation Context ...I/Os was achieved, matching the bound for the I/O model in [58]. Finally an O(Sort(E)log log V ) I/O minimum spanning tree algorithm was presented, nearly matching the O(Sort(E)log log ) I/O bound in =-=[12]-=- for the I/O model. Abello et al. [1] presented for the I/O model a functional approach to solve a sequence of graph problems based on recursion and repeated use of sorting and scanning. Their randomi... |

24 |
Sorting and searching in multisets
- Munro, Spira
- 1976
(Show Context)
Citation Context ...us sorting algorithm was presented in [41], i.e. an algorithm that works with a single array of size N only storing the N input elements plus O(1) machine words. Sorting multisets has been studied in =-=[40]-=-. 3 List labeling Itai et al. [48] studied the problem of maintaining N elements in sorted order in an array of length O(N), an important problem in dynamic dictionaries when an efficient range query ... |

24 | Cache Performance Analysis of Traversals and Random Accesses
- Ladner, Fix, et al.
- 1999
(Show Context)
Citation Context ....g. the paper by Chatterjee et al. [35] and the references it contains.Ladner et al. considered the effect of caches in connection with heaps [54], sorting [55], and sequential and random traversals =-=[52]-=-. Using registers to improve the running time of sorting was considered in [14]. Minimizing translation look-aside buffer (TLB) misses, and the case of low cache associativity was studied in [73]. Rah... |