### Table 2: Simple performance comparison of conventional memory systems (best and worst cases) and Impulse for sparse matrix-vector multiply. The starred miss requires a gather at the memory controller.

"... In PAGE 7: ... It is dominated by the cost for performing three loads (to DATA[i], COLUMN[i], and x[COLUMN[i]]). As- suming a 32-bytecache line that can hold four doubles, the advantage of Impulse is outlinedin Table2 , which lists the memory references over four iterations of the loop. The initial read of COLUMN[i] only incurs half a miss, be- cause the elements of COLUMN are single-word integers.... In PAGE 7: ... Because x is not accessed directly in Impulse, its best and worst case are identical. As Table2 shows, Impulse eliminates four memory ac- cesses, each of which are hits in the L2 cache, from the... In PAGE 8: ... Page coloring allows us to eliminate conflicts be- tween the data structures in the L2 cache. Such a scenario corresponds to the Best column in Table2 . That column says that the inner loop incurs 1.... ..."

### Table 3.2: Level 2 Sparse BLAS: sparse matrix-vector operations. USMM sparse matrix-matrix multiply C AB + C

### Table 4: Performance of sparse matrix-vector multiply in Java and C. 266 MHz Pentium II using Microsoft Java SDK 2.0 and Watcom C 10.6 (Windows 95). Results in Mflops.

1998

"... In PAGE 7: ... The test cases, taken from the Harwell-Boeing collection [5, 8], represent fairly small sparse matrices, but may provide an indication of the relative performance of these languages on kernels which contain indirect index computations. The results are presented in Table4 . Note that the higher levels of performance for WEST0156 are due to the fact that the matrix is small enough to completely fit in cache.... ..."

Cited by 19

### Table 4: Performance of sparse matrix-vector multiply in Java and C. 266 MHz Pentium II using Microsoft Java SDK 2.0 and Watcom C 10.6 (Windows 95). Results in M ops. Environments

"... In PAGE 7: ... The test cases, taken from the Harwell-Boeing collection [5, 8], represent fairly small sparse matrices, but may provide an indication of the relative performance of these languages on kernels which contain indirect index computations. The results are presented in Table4 . Note that the higher levels of performance for WEST0156 are due to the fact that the matrix is small enough to completely t in cache.... ..."

### Table 7: Matrix-Vector Operations

2002

"... In PAGE 12: ...Matrix-Vector Operations This section lists matrix-vector operations in Table7 . The matrix arguments A, B and T are dense or banded or sparse.... ..."

Cited by 23

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 62 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...995. ftp://hyena.cs.umd.edu/pub/papers/ieee toc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 6 2 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...oc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 22: Speed of general sparse matrix-vector multiplication subroutines.

1993

"... In PAGE 39: ... In both cases, there is a simple way of accomplishing the required data movement, which is to use `personalized all to all communication apos; (or `total exchange apos; as it is sometimes called) to obtain the whole vector x or y from each processor. Table22 show the speeds of the subroutine using this communication scheme. We know that the communication rate for this operation is about 0.... ..."

Cited by 2

### Table 5: Performance of Sparse Matrix by Vector

"... In PAGE 9: ...sparse matrix applications with few non zero elements per row as it is shown in Table5 (Section 4.9).... In PAGE 14: ...nrolling two iterations the loop body, thus achieving an II=1.5 instead of 2. This optimized version needs an additional dispatch to deal with the last element of rows with an odd NNZ. This overhead can have a significant impact for small values of NNZ as it is shown in Table5 . The results of Table 5 have been obtained with matrices generated pseudo-randomly.... In PAGE 14: ... The results of Table 5 have been obtained with matrices generated pseudo-randomly. The Number of Non Zero elements shown in Table5 is actually an average value. 5.... ..."

### Table 2: Matrix-vector multiplies required per order vs. length of wire

"... In PAGE 17: ... The same calculation is performed for wires of varying lengths, keeping the other two dimensions xed. Our numerical results, summarized in Table2 , show that the number of iterations, or matrix-vector... ..."