Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code (1997)

View PDF

Download Links

by Jeremy Frens , David S. Wise
Venue:In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Citations:82 - 6 self

Documents Related by Co-Citation

531 The Cache Performance and Optimizations of Blocked Algorithms – Monica S. Lam, Edward E. Rothberg, Michael E. Wolf - 1991
9061 Introduction to Algorithms – Matteo Frigo, Volker Strumpen - 2009
49 Recursive Array Layouts and Fast Parallel Matrix Multiplication – Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala, Mithuna Thottethodi - 1999
392 Gaussian elimination is not optimal – V Strassen - 1969
300 Space-filling curves – H Sagan - 1994
780 A set of Level 3 Basic Linear Algebra Subprograms – J J Dongarra, J J Du Croz, I S Duff, S Hammarling - 1990
582 Cilk: An Efficient Multithreaded Runtime System – Robert D. Blumofe , Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou - 1995
399 Automatically tuned linear algebra software – R. Clint Whaley, Jack J. Dongarra - 1998
950 Accuracy and Stability of Numerical Algorithms – N J Higham - 2002