• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Cache-Oblivious Algorithms (Extended Abstract) (1999)

Cached

  • Download as a PDF
  •  
  • Download as a PS

Download Links

  • [www-static.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [www.cc.gatech.edu]
  • [supertech.csail.mit.edu]
  • [ocw.mit.edu]
  • [theory.lcs.mit.edu]
  • [www.fftw.org]
  • [supertech.lcs.mit.edu]
  • [www.daimi.au.dk]
  • [www.daimi.au.dk]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Matteo Frigo , Charles E. Leiserson , Harald Prokop , Sridhar Ramachandran , Z W(l
Venue:In Proc. 40th Annual Symposium on Foundations of Computer Science
Citations:10 - 1 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Frigo99cache-obliviousalgorithms,
    author = {Matteo Frigo and Charles E. Leiserson and Harald Prokop and Sridhar Ramachandran and Z W(l},
    title = {Cache-Oblivious Algorithms (Extended Abstract)},
    booktitle = {In Proc. 40th Annual Symposium on Foundations of Computer Science},
    year = {1999},
    pages = {285--397},
    publisher = {IEEE Computer Society Press}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z = W(L 2 ) the number of cache misses for an m \Theta n matrix transpose is Q(1 +mn=L). The number of cache misses for either an n-point FFT or the sorting of n numbers is Q(1+ (n=L)(1 + log Z n)). We also give an Q(mnp)-work algorithm to multiply an m \Theta n...

Citations

7321 Introduction to Algorithms - Cormen, Leiserson, et al. - 2001
3633 Computer Architecture: A Quantitative Approach - Hennessy, Patterson - 1990
2172 The design and analysis of computer algorithms - Aho, Hopcroft, et al. - 1974
1567 Randomized Algorithms - Motwani, Raghavan - 1995
682 Amortized Efficiency of List Update and Paging Rules - Sleator, Tarjan - 1985
494 The input/output complexity of sorting and related problems. Commun - AGGARWAL, VITTER - 1988
442 A study of replacement algorithms for a virtual-storage computer - Belady - 1966
372 FFTW: an adaptive software architecture for the FFT - Frigo - 1998
303 Gaussian elimination is not optimal - STRASSEN - 1969
286 External memory algorithms and data structures: dealing with massive data - Vitter - 2001
226 Algorithms for parallel memory I: Two-level memories - Vitter, Shriver - 1994
142 I/O complexity: The red-blue pebble game - HONG, KUNG - 1981
129 A fast fourier transform compiler - Frigo - 1999
124 FFT’s in External or Hierarchical Memory - Bailey - 1990
121 A model for hierarchical memory - Aggarwal, Alpern, et al. - 1987
113 An algorithm for the machine computation of the complex Fourier series - Cooley, Tukey - 1965
105 Hierarchical memory with block transfer - Aggarwal, Chandra, et al. - 1987
104 The influence of caches on the performance of sorting - LaMarca, Ladner - 1999
98 An analysis of dag-consistent distributed shared-memory algorithms - Blumofe, Frigo, et al. - 1996
84 Locality of reference in LU decomposition with partial pivoting - TOLEDO - 1997
70 Cache-oblivious algorithms - Prokop - 1999
69 Auto-Blocking matrix-multiplication or tracking BLAS3 performance from source code - FRENS, WISE - 1997
67 Nonlinear array layout for hierarchical memory systems - Chatterjee, Jain, et al. - 1999
62 Algorithms for Parallel Memory II: Hierarchical Multilevel Memories - Vitter, Shriver - 1993
57 Fast Fourier transforms: a tutorial review and a state of the art - Duhamel, Vetterli - 1990
50 Deterministic distribution sort in shared and distributed memory multiprocessors - Nodine, Vitter - 1993
44 BRecursive array layouts and fast parallel matrix multiplication - Chatterjee, Lebeck, et al. - 1999
43 Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee, et al.
41 Uniform memory hierarchies - Alpern, Carter, et al. - 1990
34 Extending the Hong-Kung model to memory hierarchies - SAVAGE - 1995
32 An Algorithm for Computing the Mixed Radix Fast Fourier Transform - Singleton - 1969
24 Large-scale sorting in uniform memory hierarchies - Vitter, Nodine
8 Towards an optimal bitreversal permutation program - Carter, Gatlin - 1998
6 On the algebraic complexity of functions - Winograd - 1970
3 Efficient portability across memory hierarchies. Unpublished manuscript - Bilardi, Peserico - 1999
3 The cache performance and optimizations of blocked algortihms - Lam, Rothberg, et al. - 1991
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University