Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code (1997)

Cached

Download Links

by Jeremy Frens , David S. Wise
Venue:In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Citations:76 - 6 self

Documents Related by Co-Citation

8557 R.: Introduction to Algorithms – T Cormen, C Leiserson, Rivest - 1990
512 The Cache Performance and Optimizations of Blocked Algorithms – Monica S. Lam, Edward E. Rothberg, Michael E. Wolf - 1991
48 Recursive Array Layouts and Fast Parallel Matrix Multiplication – Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala, Mithuna Thottethodi - 1999
272 Space-Filling Curves – Hans Sagan - 1994
381 Gaussian Elimination is Not Optimal – V Strassen - 1969
72 Nonlinear Array Layouts for Hierarchical Memory Systems – Siddhartha Chatterjee, Vibhor V. Jain, Alvin R. Lebeck, Shyam Mundhra, Mithuna Thottethodi - 1999
534 Cilk: An Efficient Multithreaded Runtime System – Robert D. Blumofe , Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou - 1995
372 Automatically tuned linear algebra software – R. Clint Whaley, Jack J. Dongarra - 1998
120 Recursion leads to automatic variable blocking for dense linearalgebra algorithms – F G Gustavson - 1997
752 A set of level 3 basic linear algebra subprograms – J J Dongarra, J Du Croz, S Hammarling, I Duff - 1990
47 Towards a theory of cache-efficient algorithms – Sandeep Sen, SIDDHARTHA CHATTERJEE, Neeraj Dumir - 2000
32 High Performance Fortran for Highly Irregular Problems – Yu Charlie Hu, S. Lennart Johnsson, Shang-Hua Teng, Y. Charlie, Hu S. Lennart, Johnsson Shang--hua Teng - 1996
56 Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves – John R. Pilkington, Scott B. Baden - 1995
320 External Memory Algorithms and Data Structures – Jeffrey Scott Vitter - 1998
232 Optimizing Matrix Multiply using PHiPAC: a Portable, High-Performance, ANSI C Coding Methodology – Jeff Bilmes, Krste Asanovic , Chee-Whye Chin , Jim Demmel - 1996
139 Data-centric Multi-level Blocking – Induprakas Kodukula, Nawaaz Ahmed, d Keshav Pingali - 1997
859 Accuracy and Stability of Numerical Algorithms – N J Higham - 1996
26 Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free – David S. Wise - 2000
3962 Computer architecture: a quantitative approach, (3rd edition – J L Hennessy, D A Patterson - 2002