Results 1 
3 of
3
Cacheoblivious algorithms (Extended Abstract)
 In Proc. 40th Annual Symposium on Foundations of Computer Science
, 1999
"... This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cach ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cacheline length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cacheline length L where Z � Ω � L 2 � the number of cache misses for an m � n matrix transpose is Θ � 1 � mn � L �. The number of cache misses for either an npoint FFT or the sorting of n numbers is Θ � 1 �� � n � L � � 1 � log Z n �� �. We also give an Θ � mnp �work algorithm to multiply an m � n matrix by an n � p matrix that incurs Θ � 1 �� � mn � np � mp � � L � mnp � L � Z � cache faults. We introduce an “idealcache ” model to analyze our algorithms. We prove that an optimal cacheoblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the idealcache model can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cacheoblivious algorithms in practice.
Towards a Theory of CacheEfficient Algorithms ∗
, 2000
"... We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the ..."
Abstract
 Add to MetaCart
We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cacheoptimal algorithms for some fundamental problems like sorting, FFT, and an important subclass of permutations in the singlelevel cache model. We also show that ignoring associativity concerns could lead to inferior performance, by analyzing the averagecase cache behavior of mergesort. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and dealing with the hitherto unresolved problem of limited associativity. 1