This thesis presents "cache-oblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobi-style multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cache-line length L, where Z =# (L 2 ), the number of cache misses for an m × n matrix transpose is #(1 + mn=L). The number of cache misses for either an n-point FFT or the sorting of n numbers is #(1 + (n=L)(1 + log Z n)). The cache complexity of computing n ...
|
6121
|
Introduction to Algorithms
– Cormen, Leiserson, et al.
- 2001
|
|
3312
|
Computer Architecture a Quantitative Approach
– Hennessy, Patterson
- 1996
|
|
2010
|
The Design and Analysis of Computer Algorithms
– Aho, Hopcroft, et al.
- 1974
|
|
1309
|
Randomized algorithms
– Motwani, Raghavan
- 1995
|
|
612
|
Amortized efficiency of list update and paging rules
– Sleator, Tarjan
- 1985
|
|
450
|
Online Computation and Competitive Analysis
– Borodin, El-Yaniv
- 1998
|
|
401
|
The input/output complexity of sorting and related problems
– Aggarwal, Vitter
- 1988
|
|
386
|
A study of replacement algorithms for virtualstorage computers
– Belady
- 1966
|
|
366
|
Algorithms in C
– Sedgewick
- 1990
|
|
295
|
FFTW: An adaptive software architecture for the FFT
– Frigo, Johnson
- 1998
|
|
227
|
Algorithms for parallel memory I: two-level memories
– Vitter, Shriver
- 1994
|
|
222
|
External memory algorithms and data structures: Dealing with
– Vitter
- 2000
|
|
116
|
FFTs in external or hierarchical memory
– Bailey
- 1990
|
|
113
|
A model for hierarchical memory
– Aggarwal, Alpern, et al.
- 1987
|
|
103
|
A Fast Fourier Transform Compiler
– Frigo
- 1999
|
|
100
|
Hierarchical memory with block transfer
– Aggarwal, Chandra, et al.
- 1987
|
|
97
|
I/O Complexity: the RedBlue Pebble Game
– Hong, Kung
- 1981
|
|
92
|
An analysis of dag-consistent distributed shared-memory algorithms
– Blumofe, Frigo, et al.
- 1996
|
|
90
|
An algorithm for the machine computation of complex Fourier series
– Cooley, Tukey
- 1965
|
|
68
|
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
– Frens, Wise
- 1997
|
|
67
|
Sorting and Searching
– KNUTH
- 1998
|
|
66
|
Locality of reference in lu decomposition with partial pivoting
– Toledo
- 1997
|
|
65
|
Nonlinear Array Layouts for Hierarchical Memory Systems
– Chatterjee, Jain, et al.
- 1999
|
|
58
|
Algorithms for parallel memory II: Hierarchical multilevel memories
– Vitter, Shriver
- 1994
|
|
57
|
Deterministic distribution sort in shared and distributed memory multiprocessors
– Nodine, Vitter
- 1993
|
|
46
|
DAGconsistent distributed shared memory
– Blumofe, Frigo, et al.
- 1996
|
|
45
|
Writing Efficient Programs
– Bentley
- 1982
|
|
44
|
Automatic parallelization of divide and conquer algorithms
– Rugina, Rinard
- 1999
|
|
43
|
Recursive array layouts and fast parallel matrix multiplication
– Chatterjee, Lebeck, et al.
- 1999
|
|
43
|
Uniform memory hierarchies
– Alpern, Carter, et al.
- 1990
|
|
41
|
Fast Fourier Transforms: A Tutorial Review and a State of the Art
– Duhamel, Vetterli
- 1990
|
|
35
|
Gaussian elimination is not optimal,” Numerische Mathematik 13
– Strassen
- 1969
|
|
25
|
An algorithm for computing the mixed radix fast Fourier transform
– SINGLETON
- 1969
|
|
24
|
Extending the Hong-Kung model to memory hierachies
– Savage
- 1995
|
|
24
|
Large-scale sorting in uniform memory hierarchies
– Vitter, Nodine
- 1993
|
|
8
|
Cache-oblivious algorithms (extended abstract
– Frigo, Leiserson, et al.
- 1999
|
|
8
|
Back to the future: Time to return to some long standing problems in computer systems? Federated Computer Conference
– Hennessy
- 1999
|
|
6
|
On the algebraic complexity of functions
– WINOGRAD
- 1970
|
|
1
|
Uniform memory hierarchies. Pro
– ALPERN, CARTER, et al.
- 1990
|
|
1
|
Future investment in information technology research: Report of the president's information technology advisory committee. Plenary talk at FCRC'99
– KENNEDY
|
|
1
|
Gaussian elimination is not optimal. Numerische Mathematik 13
– STRASSE, V
- 1969
|