### Table 3: List of BLAS routines used for blocked factorization algorithms

1992

"... In PAGE 7: ... Table 2 shows the abbreviations necessary to explain the algorithmic codings presented in this paper. Table3 shows the BLAS subprograms used in the di erent implementations of the LU factorization algorithms. Looking at the computational e ort of the BLAS routines it is clear that the ratio between oating point operations and memory accesses for the level 1 and 2 BLAS is not as good as for the level 3 BLAS which consists of more computations per memory access.... ..."

Cited by 5

### Table 3: List of BLAS routines used for blocked factorization algorithms

1992

"... In PAGE 7: ... Table 2 shows the abbreviations necessary to explain the algorithmic codings presented in this paper. Table3 shows the BLAS subprograms used in the di erent implementations of the LU factorization algorithms. Looking at the computational e ort of the BLAS routines it is clear that the ratio between oating point operations and memory accesses for the level 1 and 2 BLAS is not as good as for the level 3 BLAS which consists of more computations per memory access.... ..."

Cited by 5

### Table 5.5: The effect of block size on factorization time. Results from DEC 3000-400. Level 2 BLAS Level 3 BLAS Level 3 BLAS Level 3 BLAS

### Table 9. Results with Level 1 BLAS divided by those with Level 3 BLAS and block size 32.

### Table 10. Results with Level 2 BLAS divided by those with Level 3 BLAS and block size 32.

### Table 9. Results with Level 1 BLAS divided by those with Level 3 BLAS and block size 32.

in The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations

### Table 10. Results with Level 2 BLAS divided by those with Level 3 BLAS and block size 32.

in The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations

### Table 11. Results with Level 3 BLAS with block size 16 divided by those with block size 32.

"... In PAGE 30: ...Table11 shows that with block size 16 we get slightly worse performance on the Cray and on the IBM and unchanged performance on the SUN. Table 12 shows that with block size 64 we get slightly worse performance on the SUN and IBM and unchanged performance on the Cray.... ..."

### Table 11. Results with Level 3 BLAS with block size 16 divided by those with block size 32.

in The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations

"... In PAGE 27: ...Table11 shows that with block size 16 we get slightly worse performance on the CRAY and on the IBM and unchanged performance on the SUN. Table 12 shows that with block size 64 we get slightly worse performance on the SUN and IBM and unchanged performance on the CRAY.... ..."

### Table 7. Performance study of strategy 3 using relaxation of the frontal matrix structure to enable more Level 3 BLAS operations (NEMIN=8).

"... In PAGE 19: ... This is illustrated on the right-hand side of Figure 11 where we show (for a block size of four) how the block structure is modi ed during the assembly process to facilitate the use of the block algorithm. We show, in Table7 , the in uence of this relaxation of the sparsity structure of the frontal matrices on the performance of the QR algorithm. We see in Table 7 that, because of the relaxation of the nonzero structure, the size of the Q array increases.... In PAGE 19: ... We show, in Table 7, the in uence of this relaxation of the sparsity structure of the frontal matrices on the performance of the QR algorithm. We see in Table7 that, because of the relaxation of the nonzero structure, the size of the Q array increases. The increase in the number of operations with relaxation comes from both the increase in the length of the Householder vectors and the use of the block algorithm.... In PAGE 19: ... We nally observe, in Table 7, that, with relaxation of the nonzero structure, we obtain, on the Alliant FX/80, a very signi cant decrease in the time to perform the factorization step in the multiprocessor case. This performance improvement not only comes from the relative increase in the uniprocessor Mega op rate (see column \M op/s quot; in Table7 ) but also from the increase in the parallelism of the method. One can compute from Table 7 the speedups obtained with Level 2 BLAS and with relaxed Level 3 BLAS and notice that, for example with large2, the speedup increases from 3.... In PAGE 19: ... This performance improvement not only comes from the relative increase in the uniprocessor Mega op rate (see column \M op/s quot; in Table 7) but also from the increase in the parallelism of the method. One can compute from Table7 the speedups obtained with Level 2 BLAS and with relaxed Level 3 BLAS and notice that, for example with large2, the speedup increases from 3.... ..."