## Exploiting Fast Matrix Multiplication within the Level 3 (1990)

Venue: | BLAS. ACM Trans. Math. Soft |

Citations: | 52 - 9 self |

### BibTeX

@ARTICLE{Higham90exploitingfast,

author = {Nicholas J. Higham},

title = {Exploiting Fast Matrix Multiplication within the Level 3},

journal = {BLAS. ACM Trans. Math. Soft},

year = {1990},

volume = {16},

pages = {352--368}

}

### Years of Citing Articles

### OpenURL

### Abstract

The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptoti-cally faster than the conventional ones. These algorithms are based on Strassen’s method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3. ” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.

### Citations

2564 |
h~ Design and Analysis of Computer Algorithms
- Hopcroft, Ullman
- 1974
(Show Context)
Citation Context ...lues, but they are sufficient to give insight into the error behavior. The matrices used are defined as follows: urandi and nrandi are random matrices with elements from the uniform [0, l] and normal =-=(0, 1)-=- distributions, respectively. 2; is a random matrix with 2-norm condition number 104. P is the Pascal matrix, made up from the numbers in Pascal’s triangle; its (i, j) element is (i + j - 2)!/[(; - l)... |

2196 | Numerical Recipes in C. The Art of Scientific Computation - Press, Teukolsky, et al. - 1994 |

840 |
Matrix multiplication via arithmetic progressions
- Coppersmith, Winograd
- 1987
(Show Context)
Citation Context ...real and complex matrix multiplication using a variant of Strassen’s method due to Winograd. The exponent for matrix multiplication has been reduced several times to the current record value of 2.376 =-=[7]-=-, but as far as we know none of these asymptotically faster algorithms is quicker than Strassen’s method for values of n for which dense matrix multiplication is currently performed in practice (n I 1... |

780 | A set of Level 3 Basic Linear Algebra Subprograms - Dongarra, Croz, et al. - 1990 |

392 |
Gaussian elimination is not optimal
- Strassen
- 1969
(Show Context)
Citation Context ...erms: Algorithms Additional Key Words and Phrases: Error analysis, level 3 BLAS, matrix multiplication, numerical stability, Strassen’s algorithm, triangular systems 1. INTRODUCTION In 1969, Strassen =-=[24]-=- showed how to multiply two n x n matrices with less than 4.7n’Ogz7 arithmetic operations. Since log,7 = 2.807 c 3, his method improves asymptotically on the standard algorithm for matrix multiplicati... |

117 | Algorithmics: Theory and Practice - Brassard, Bratley - 1988 |

103 | Algorithm 679: A set of level 3 Basic Linear Algebra Subprograms: Model implementation and test programs - Dongarra, Croz, et al. - 1990 |

69 | The impact of hierarchical memory systems on linear algebra algorithm design - Gallivan, Jalby, et al. - 1987 |

35 | Prospectus for the Development of a Linear Algebra Library for High-Performance Computers
- Demmel, Dongarra, et al.
- 1987
(Show Context)
Citation Context ...he conventional BLAS3. For applications in which the BLAS3 are employed as building blocks, an important consideration is whether it is crucial that component-wise small residuals be achieved. LAPACK =-=[9]-=- makes use of the BLAS3 in its block factorization algorithms, and it is desirable to know whether these algorithms remain backwards stable when the fast BLAS3 are used. Specifically, are the computed... |

33 | Fast polar decomposition of an arbitrary matrix - Higham, Schreiber - 1990 |

32 | Extra high speed matrix multiplication on the Cray-2
- Bailey
- 1988
(Show Context)
Citation Context ...lgol-W on an IBM 360/67 and concluded that in this environment Strassen’s method (with just one level of recursion) runs faster than the conventional method for n 2 110. Furthermore, recently, Bailey =-=[2]-=- compared his FORTRAN implementation of Strassen’s algorithm for the Cray-2 with the Cray library routine for matrix multiplication and observed speed-up factors Author’s current address: Department o... |

30 | The use of blas3 in linear algebra on a parallel processor with hierachical memory - Gallivan, Jalby, et al. - 1986 |

25 | Algorithms for matrix multiplication
- Brent
- 1970
(Show Context)
Citation Context ...rix multiplication, which requires 0 (n”) operations. Some have regarded Strassen’s algorithm as being of theoretical interest only (see, for example, [21, p. 76; 23, p. 5331). However, in 1970 Brent =-=[5]-=- implemented Strassen’s algorithm in Algol-W on an IBM 360/67 and concluded that in this environment Strassen’s method (with just one level of recursion) runs faster than the conventional method for n... |

13 |
1970b. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd’s identity
- BRENT
(Show Context)
Citation Context ... We ACM Transactions on Mathematical Software, Vol. 16, No. 4, December 1990.sdefine the following quantities: A II e - c II PN(C) = n2u, 11 A II I[ B [I ^ pc(C) = max e,(Q = II 6 - c II us II c II e,=-=(6)-=- = max 1 tij - Cij 1 i,j nus( I A l I B l )tj 1 tij - Cij 1 4 I cij I ii -I \ Exploiting Fast Matrix Multiplication - 361 (norm-wise relative residual), (component-wise relative residual), (norm-wise ... |

11 | Further comparisons of direct methods for computing stationary distributions of Markov chains - Heyman - 1987 |

9 | The accuracy of solutions to triangular systems
- Higham
- 1989
(Show Context)
Citation Context ...as for (4.1)): Tg=B+E, II E II 5 ch nob II T II II -2 II + Wu2L For comparison, consider the computed solution x obtained using back substitutions. The ith column Xi of x satisfies (see, for example, =-=[16]-=-) (T + Ei)%i = biy ( Ei ] I (IZ + 1)~ ] T ] . It follows that Tx=B+F, IFI ~(n+lhITI 1x1, (4.6) and the latter bound implies ]I F I] 5 n(n + 1)~ ]I T ]I (1 x I]. Thus, the same comments apply as for St... |

9 | Computational complexity and numerical stability
- MILLER
- 1975
(Show Context)
Citation Context ...d. Partly, this is because the early error analysis of the method in [5] was not published (Brent’s paper [6] contains some material from [5], but not the error analysis of Strassen’s method). Miller =-=[19]-=- states a stability result for Strassen’s method in general terms. His result is presented in a more specific form by Bailey in [2], though, unfortunately, an error in the statement makes the result t... |

8 |
Stability of fast algorithms for matrix multiplication
- Bini, Lotti
- 1980
(Show Context)
Citation Context ... for Strassen’s method in general terms. His result is presented in a more specific form by Bailey in [2], though, unfortunately, an error in the statement makes the result too strong. Bini and Lotti =-=[3]-=- give an error analysis of a class of fast matrix multiplication techniques that includes Strassen’s; when specialized to Strassen’s method, their quite general error bound is similar to the result gi... |

7 | Use of level 3 - Dayd'e, Duff - 1989 |

5 | Block algorithms for parallel machines. In Numerical Algorithms for Modern Parallel Computer Architectures - Schreiber - 1988 |