### Table 1. Hierarchical Sparse Matrix Storage Format compared to JD and CRS for Section size 64

2003

"... In PAGE 3: ... AF The D7BE-blocks at all levels are represented as an array (called a D7BE-blockarray whose entries are non-zero val- ues (for level-BC) or pointers to non-empty lower level D7BE-blockarrays (for all higher levels) along with their corresponding positional information within the block. As can be observed in Table1 the HiSM format offers a storage reduction of about 40% versus the JD and CRS for- mats and is equivalent to the reduction offered by the BBCS format. Furthermore, from the locality measure which we describe in the next Section we can conclude that each D7BE- blockarray contains on average BEBMBDBK A2 D7 non-zero elements... ..."

Cited by 3

### Table 3. Hierarchical Sparse Matrix Storage Format compared to JD and CRS for Section size 64

2003

"... In PAGE 8: ...18 A2 (block dimension), which makes the search complexity BEBMBDBK A2 D7 on average, or C7B4D7B5, rather than C7B4D7BEB5. In Table3 , we give the number of steps (comparisons + displacements) needed to insert an element in the matrix for a number matrices. The results are compared to the BBCS, CRS and JD storage methods.... ..."

Cited by 3

### Table 14: Number of TREC Data Elements in each Sparse Matrix Storage Formats

"... In PAGE 7: ...Table14 shows that CSC storage will have a definite advantage over COO, but loose to CSR, because of less compression achieved in column vector of CSC than compression in row vector of CSR. BSR consists of blocks that include non-zero elements, along with zero elements.... ..."

### Table 34: Sparse formats

1996

"... In PAGE 71: ...The sparsity structure can be stored in a couple of di erent ways. To better describe these formats we consider a 3 by 3 matrix as shown in Table34 . In the same table one can see how the data is stored in a single vector and how the sparse data is stored for di erent formats.... ..."

### Table 2: Matrix benchmark suite. Matrices are categorized roughly as fol- lows: 1 is a dense matrix stored in sparse format; 2{17 arise in nite element applications; 18{39 come from assorted applications; 40{44 are linear program- ming examples.

2002

"... In PAGE 2: ... The two bounds di er only in their assumption about whether con ict misses occur: in the upper bound any value that has been used before is modeled as a cache hit (no con ict misses), whereas the lower bound assumes that all data must be reloaded. We then use detailed hardware counter data collected on 4 di erent computing platforms (Table 1) over a test set of 44 sparse matrices ( Table2 ) to show that our upper bound is in fact a quite accurate approximation of reality, i.e.... In PAGE 4: ... Matrices We evaluate the SpM V implementations on the matrix benchmark suite used by Im [16]. Table2 summarizes the size and source of each matrix. Most of the matrices are available from either of the collections at NIST (MatrixMarket [5]) and the University of Florida [9].... In PAGE 4: ...y Im [16]. Table 2 summarizes the size and source of each matrix. Most of the matrices are available from either of the collections at NIST (MatrixMarket [5]) and the University of Florida [9]. The matrices in Table2 are arranged in roughly four groups. Matrix 1 is a dense matrix stored in sparse format; matrices 2{17 arise in nite element method (FEM) applications; 18{39 come from assorted applications (including chemical process engineering, oil reservoir modeling, circuits, and nance); 40{ 44 are linear programming examples.... In PAGE 13: ...4 Evaluating register blocking performance We now evaluate the register blocking optimization with respect to the upper and lower bounds on performance derived above. Figures 7{10 summarize our evaluation on the four hardware platforms in Table 1 and the matrix benchmark suite in Table2 , with respect to the upper and lower performance bounds. We compare the following implementations: Reference: The unblocked (1 1) implementation is represented by as- terisks.... ..."

Cited by 31

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 6 2 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...oc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 1: Performance of sparse matrix-vector product

1997

"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 62 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...995. ftp://hyena.cs.umd.edu/pub/papers/ieee toc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."

Cited by 9

### Table 1: Solvers and preconditioners

"... In PAGE 14: ...Table 8: Level Information Structure struct LevInfo f ExternSubr *solver; #2F* Pointer to how to call solver *#2F ExternSubr *precond; #2F* Pointer to how to call preconditioner *#2F ExternSubr *matrix vec; #2F* Pointer to how to call matrix*vector *#2F ExternSubr *change lev; #2F* Pointer to how to call level changer *#2F int SolverIters; #2F* Number of iterations in solver#28#29 *#2F double SolverRNorm; #2F* Howmuch to reduce residual norm *#2F int MGIters; #2F* Number of iterators of MGC *#2F int NIIters; #2F* Number of iterators of NIC *#2F void *X j ; #2F* Pointer to x j *#2F void *B j ; #2F* Pointer to b j *#2F int NX j ; #2F* Length of x j *#2F int NB j ; #2F* Length of b j *#2F int NZA j ; #2F* Number of nonzeroes in A j *#2F Matrix *A j ; #2F* Pointer to A j representation *#2F Matrix *R j ; #2F* Pointer to R j representation *#2F Matrix *P j ; #2F* Pointer to P j representation *#2F Matrix *NIP j ; #2F* Pointer to NIP j representation *#2F Matrix *FASR j ; #2F* Pointer to R #28FAS#29 j representation *#2F g; typedef struct LevInfo LevInfo; #2F* Simplify LevInfo declarations *#2F can be given to the user #28see Table1 #29. Consider Table 4.... ..."

### Table 2. Descriptions of input matrices.

2002

"... In PAGE 5: ... These are also linear elasticity problems. Table2 shows statistics on the matrices. In all our experiments, the sparse matrix is stored in a compressed sparse row (CSR) format.... ..."

Cited by 11

### Table 8: Timing (ms) Results for Finite Di erence Data in ELL format using Sparse util gather routine on CM-200 with 32K processors

1993

"... In PAGE 21: ...31 Table 2: M op performance of SUM and SCAN ADD on a 32K-processor CM-200 and a 512-processor CM-5 without vector units Indirect Addressing. Communication costs due to indirect addressing can be quite high for sparse matrix computations on massively parallel computers, see illustration in Table8 . Since it is unavoidable to use indirect addressing in the sparse matrix context, we examine three di erent ways of handling the indirect addressing on the Connection Machines.... In PAGE 23: ... Also the performance is about the same with variant bandwidths in the band matrix data set. In Table8 , we show the time spent in multiplications, gather operation, and summa- tion of each row when and the matrix is stored in ELL format. In Table 9, we show some timing results for communication compiler routines when the matrix is stored in CSR format and communication compiler get and send add routines are used.... ..."

Cited by 2