Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code (1997)

Cached

Download Links

by Jeremy Frens , David S. Wise
Venue:In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Citations:76 - 6 self

Documents Related by Co-Citation

8533 Introduction to Algorithms – T H Cormen, C E Leiserson, R L Rivest - 1989
509 The Cache Performance and Optimizations of Blocked Algorithms – Monica S. Lam, Edward E. Rothberg, Michael E. Wolf - 1991
48 Recursive Array Layouts and Fast Parallel Matrix Multiplication – Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala, Mithuna Thottethodi - 1999
73 Nonlinear Array Layouts for Hierarchical Memory Systems – Siddhartha Chatterjee, Vibhor V. Jain, Alvin R. Lebeck, Shyam Mundhra, Mithuna Thottethodi - 1999
272 Space-Filling Curves – H Sagan - 1994
373 Gaussian elimination is not optimal – V Strassen - 1969
531 Cilk: An Efficient Multithreaded Runtime System – Robert D. Blumofe , Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou - 1995
372 Automatically tuned linear algebra software – R. Clint Whaley, Jack J. Dongarra - 1998
119 Recursion leads to automatic variable blocking for dense linear-algebra algorithms – F GUSTAVSON - 1997
742 A set of level 3 basic linear algebra subprograms – J DONGARRA, J DUCROZ, I S DUFF, S HAMMARLING - 1990
48 Towards a theory of cache-efficient algorithms – Sandeep Sen, SIDDHARTHA CHATTERJEE, Neeraj Dumir - 2000
32 High Performance Fortran for Highly Irregular Problems – Yu Charlie Hu, S. Lennart Johnsson, Shang-Hua Teng, Y. Charlie, Hu S. Lennart, Johnsson Shang--hua Teng - 1996
56 Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves – John R. Pilkington, Scott B. Baden - 1995
320 External Memory Algorithms and Data Structures – Jeffrey Scott Vitter - 1998
227 Optimizing Matrix Multiply using PHiPAC: a Portable, High-Performance, ANSI C Coding Methodology – Jeff Bilmes, Krste Asanovic , Chee-Whye Chin , Jim Demmel - 1996
140 Data-centric Multi-level Blocking – Induprakas Kodukula, Nawaaz Ahmed, d Keshav Pingali - 1997
847 Accuracy and Stability of Numerical Algorithms – N J Higham - 2002
26 Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free – David S. Wise - 2000
3973 Computer Architecture: A Quantitative Approach, 3 rd ed – J L Hennessy, D A Patterson, D Goldberg - 2002