• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 33,845
Next 10 →

Table 22: Speed of general sparse matrix-vector multiplication subroutines.

in Basic Sparse Matrix Computations on Massively Parallel Computers
by W. Ferng, K. Wu, S. Petiton, Y. Saad 1993
"... In PAGE 39: ... In both cases, there is a simple way of accomplishing the required data movement, which is to use `personalized all to all communication apos; (or `total exchange apos; as it is sometimes called) to obtain the whole vector x or y from each processor. Table22 show the speeds of the subroutine using this communication scheme. We know that the communication rate for this operation is about 0.... ..."
Cited by 2

Table 1 An iterative framework for using sparse matrix-vector multiplication. For i = 1, . . .

in Blocked Data Distribution for the Conjugate Gradient Algorithm on the CRAY T3D
by Michael W. Berry, Charles Grassl, Vijay K. Krishna
"... In PAGE 5: ... BBA is independent of the data structure used to represent the blocks of the matrix A as illustrated in Figure 3. This algorithm is well suited for iterative methods (see Table1 ) in which the output vector yi is the input vector xi+1 for the next... In PAGE 9: ... Using the fold operation illustrated in Table 2 and Figure 4, we add vectors z from processors within rows to get y which is a subvector of the vector y . If the next matrix-vector multiplication on processor P requires y (see Table1 ), the transpose operation can be used to copy... ..."

Table 1: Performance of sparse matrix-vector product

in Compiling Parallel Code for Sparse Matrix Applications
by Vladimir Kotlyar, Keshav Pingali, Paul Stodghill 1997
"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 62 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...995. ftp://hyena.cs.umd.edu/pub/papers/ieee toc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."
Cited by 9

Table 1: Performance of sparse matrix-vector product

in Compiling Parallel Code for Sparse Matrix Applications
by Vladimir Kotlyar , Keshav Pingali, Paul Stodghill 1997
"... In PAGE 2: ... The main algorithm we will consider in this paper is matrix-vector product which is the core computation in iterative solvers for linear systems. Consider the performance (in M ops) of sparse matrix-vector product on a single processor of an IBM SP-2 for a variety of matrices and storage formats, shown in Table1 (descriptions of the matrices and the formats can be found in Appendix A). Boxed numbers indicate the highest performance for a given matrix.... In PAGE 2: ... This demonstrates the di culty of developing a \sparse BLAS quot; for sparse matrix computations. Even if we limit ourselves to the formats in Table1 , one still has to provide at least 6 2 = 36 versions of sparse matrix-matrix product... In PAGE 19: ...oc.ps.Z. Appendix A Matrix formats The matrices shown in Table1 are obtained from the suite of test matrices supplied with the PETSc library [4] (small,medium,cfd.1.... ..."
Cited by 9

Table 1. Performance measurements for the matrix-vector multiplication. t = running time in seconds, s = speedup compared to sequential code matrix size sequential 4 processors 16 processors 64 processors t t s t s t s 5122

in Distributed Arrays in the Functional Language Concurrent Clean
by Pascal Serrarens 1997
"... In PAGE 6: ... We compared the code against sequential code with no overheads for parallelism. Table1 shows good speedups, because the local matrix-vector multiplications take most of the time. Another test case the conjugate gradient algorithm.... ..."
Cited by 1

Table 4.4 Timings on a CM-5 with 512 processors for sparse matrix-vector multiplication, inner product wise triangular solve, and vector update wise triangular solve

in Scalable Parallel Preconditioning With The Sparse Approximate Inverse Of Triangular Matrices
by Arno C. N. Van Duin 1999
Cited by 6

Table 6 Execution times in seconds, number of matrix-vector multiplications, number of processors for the Schur complement technique

in Parallel Solution of General Sparse Linear Systems
by Sergey Kuznetsov, Gen-ching Lo, Yousef Saad
"... In PAGE 14: ... This represents the main weakness of Schur complement techniques. Table6 gives the timing results, the number of matrix-vector multiplications for solving the systems for the interface data using a relative tolerance of quot; = 10?5, a Krylov subspace dimension of m = 50, a level of ll of 25, and a number of inner iteration of 10. All of the computation were done according to the description of Section 2.... ..."

Table 5.1 Memory requirements for matrix-vector multiplication in CSI-MSVD using 20 processors.

in Estimating the Largest Singular Values/Vectors of Large Sparse Matrices via Modified Moments
by Sowmini Varadhan 1996
Cited by 1

Table 1 The performance comparison of the matrix-vector multiplication task for each software development phase

in Adaptive Distributed Virtual Computing Environment (ADViCE)
by Salim Hariri, Dongmin Kim, Yoonhee Kim, Ilkyeun Ra, Baoqing Ye, Xue Bing, Haluk Topcuoglu, Jon Valente
"... In PAGE 16: ... As an example, for the p4-based implementation of the matrix-vector multiplication algorithm, we can determine from Figure 11 that eight nodes provide the best performance among the test cases. Table1 compares the times required to develop, compile, execute, and visualize the Matrix-Vector Multiplication task using p4 and the ADViCE prototype for a 1024 1024 problem size with four nodes. In the design and implementation phase, it takes around 862 minutes for a parallel programming expert to develop a p4-based multiplication pro- gram from scratchifwe assume that programming speed is twominutes per line.... ..."

Table 7: Matrix-Vector Operations

in An Updated Set of Basic Linear Algebra Subprograms (BLAS)
by L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, R. C. Whaley 2002
"... In PAGE 12: ...Matrix-Vector Operations This section lists matrix-vector operations in Table7 . The matrix arguments A, B and T are dense or banded or sparse.... ..."
Cited by 23
Next 10 →
Results 1 - 10 of 33,845
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University