### Table 22: Speed of general sparse matrix-vector multiplication subroutines.

1993

"... In PAGE 39: ... In both cases, there is a simple way of accomplishing the required data movement, which is to use `personalized all to all communication apos; (or `total exchange apos; as it is sometimes called) to obtain the whole vector x or y from each processor. Table22 show the speeds of the subroutine using this communication scheme. We know that the communication rate for this operation is about 0.... ..."

Cited by 2

### Table 1 An iterative framework for using sparse matrix-vector multiplication. For i = 1, . . .

"... In PAGE 5: ... BBA is independent of the data structure used to represent the blocks of the matrix A as illustrated in Figure 3. This algorithm is well suited for iterative methods (see Table1 ) in which the output vector yi is the input vector xi+1 for the next... In PAGE 9: ... Using the fold operation illustrated in Table 2 and Figure 4, we add vectors z from processors within rows to get y which is a subvector of the vector y . If the next matrix-vector multiplication on processor P requires y (see Table1 ), the transpose operation can be used to copy... ..."

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 62 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...995. ftp://hyena.cs.umd.edu/pub/papers/ieee toc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 6 2 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...oc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 1. Performance measurements for the matrix-vector multiplication. t = running time in seconds, s = speedup compared to sequential code matrix size sequential 4 processors 16 processors 64 processors t t s t s t s 5122

1997

"... In PAGE 6: ... We compared the code against sequential code with no overheads for parallelism. Table1 shows good speedups, because the local matrix-vector multiplications take most of the time. Another test case the conjugate gradient algorithm.... ..."

Cited by 1

### Table 4.4 Timings on a CM-5 with 512 processors for sparse matrix-vector multiplication, inner product wise triangular solve, and vector update wise triangular solve

1999

Cited by 6

### Table 6 Execution times in seconds, number of matrix-vector multiplications, number of processors for the Schur complement technique

"... In PAGE 14: ... This represents the main weakness of Schur complement techniques. Table6 gives the timing results, the number of matrix-vector multiplications for solving the systems for the interface data using a relative tolerance of quot; = 10?5, a Krylov subspace dimension of m = 50, a level of ll of 25, and a number of inner iteration of 10. All of the computation were done according to the description of Section 2.... ..."

### Table 5.1 Memory requirements for matrix-vector multiplication in CSI-MSVD using 20 processors.

1996

Cited by 1

### Table 1 The performance comparison of the matrix-vector multiplication task for each software development phase

"... In PAGE 16: ... As an example, for the p4-based implementation of the matrix-vector multiplication algorithm, we can determine from Figure 11 that eight nodes provide the best performance among the test cases. Table1 compares the times required to develop, compile, execute, and visualize the Matrix-Vector Multiplication task using p4 and the ADViCE prototype for a 1024 1024 problem size with four nodes. In the design and implementation phase, it takes around 862 minutes for a parallel programming expert to develop a p4-based multiplication pro- gram from scratchifwe assume that programming speed is twominutes per line.... ..."

### Table 7: Matrix-Vector Operations

2002

"... In PAGE 12: ...Matrix-Vector Operations This section lists matrix-vector operations in Table7 . The matrix arguments A, B and T are dense or banded or sparse.... ..."

Cited by 23