Results 1 
6 of
6
CacheOblivious Algorithms
, 1999
"... This thesis presents "cacheoblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache si ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
This thesis presents "cacheoblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cacheline length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobistyle multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cacheline length L, where Z =# (L 2 ), the number of cache misses for an m &times; n matrix transpose is #(1 + mn=L). The number of cache misses for either an npoint FFT or the sorting of n numbers is #(1 + (n=L)(1 + log Z n)). The cache complexity of computing n ...
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
right notice and this permission notice are preserved on all copies.
Cacheoblivious algorithms (Extended Abstract)
 In Proc. 40th Annual Symposium on Foundations of Computer Science
, 1999
"... This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cach ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cacheline length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cacheline length L where Z � Ω � L 2 � the number of cache misses for an m � n matrix transpose is Θ � 1 � mn � L �. The number of cache misses for either an npoint FFT or the sorting of n numbers is Θ � 1 �� � n � L � � 1 � log Z n �� �. We also give an Θ � mnp �work algorithm to multiply an m � n matrix by an n � p matrix that incurs Θ � 1 �� � mn � np � mp � � L � mnp � L � Z � cache faults. We introduce an “idealcache ” model to analyze our algorithms. We prove that an optimal cacheoblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the idealcache model can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cacheoblivious algorithms in practice.
Faster Algorithms for Integer Lattice Basis Reduction
, 1996
"... The well known L³reduction algorithm of Lov'asz transforms a given integer lattice basis b1 ; b2 ; : : : ; bn 2 ZZ n into a reduced basis. The cost of L 3 reduction is O(n 4 log Bo) arithmetic operations with integers bounded in length by O(n log Bo) bits. Here, Bo bounds the Euclidean ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
The well known L³reduction algorithm of Lov'asz transforms a given integer lattice basis b1 ; b2 ; : : : ; bn 2 ZZ n into a reduced basis. The cost of L 3 reduction is O(n 4 log Bo) arithmetic operations with integers bounded in length by O(n log Bo) bits. Here, Bo bounds the Euclidean length of the input vectors, that is, Bo jb1 j 2 ; jb2 j 2 ; : : : ; jbn j 2 . We present a simple modification of the L³reduction algorithm that requires only O(n³ log Bo) arithmetic operations with integers of the same length. We gain a further speedup by combining our new approach with Schonhage's modification of the L³reduction algorithm and incorporating fast matrix mutliplication techniques. The result is an algorithm for semireduction that requires O(n 2:381 log Bo ) arithmetic operations with integers of the same length.
Computer Algorithms. AddisonWesley, 1974.
"... numbers from composite numbers. Annals of Mathematics, 117:173–206, 1983. [4] Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988. ..."
Abstract
 Add to MetaCart
numbers from composite numbers. Annals of Mathematics, 117:173–206, 1983. [4] Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988.