Results 1 
7 of
7
Scientific Computing on Bulk Synchronous Parallel Architectures
"... We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures. ..."
Abstract

Cited by 70 (13 self)
 Add to MetaCart
We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures.
A TwoDimensional Data Distribution Method For Parallel Sparse MatrixVector Multiplication
 SIAM REVIEW
"... A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to onedimensional methods, and in general a good balance in the communication work.
An Efficient Parallel Algorithm for MatrixVector Multiplication
 International Journal of High Speed Computing
, 1995
"... . The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scientific computation. A fast and efficient parallel algorithm for this calculation is therefore desirable. This paper describes a parallel matrixvector multiplication algorithm which is particularly ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
. The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scientific computation. A fast and efficient parallel algorithm for this calculation is therefore desirable. This paper describes a parallel matrixvector multiplication algorithm which is particularly well suited to dense matrices or matrices with an irregular sparsity pattern. Such matrices can arise from discretizing partial differential equations on irregular grids or from problems exhibiting nearly random connectivity between data structures. The communication cost of the algorithm is independent of the matrix sparsity pattern and is shown to scale as O(n= p p + log(p)) for an n \Theta n matrix on p processors. The algorithm's performance is demonstrated by using it within the well known NAS conjugate gradient benchmark. This resulted in the fastest run times achieved to date on both the 1024 node nCUBE 2 and the 128 node Intel iPSC/860. Additional improvements to the algorithm whic...
A Parallel GMRES Version For General Sparse Matrices
 Electronic Transactions on Numerical Analysis
, 1995
"... . This paper describes the implementation of a parallel variant of GMRES on Paragon. This variant builds an orthonormal Krylov basis in two steps: it first computes a Newton basis then orthogonalises it. The first step requires matrixvector products with a general sparse unsymmetric matrix and the ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
. This paper describes the implementation of a parallel variant of GMRES on Paragon. This variant builds an orthonormal Krylov basis in two steps: it first computes a Newton basis then orthogonalises it. The first step requires matrixvector products with a general sparse unsymmetric matrix and the second step is a QR factorisation of a rectangular matrix with few long vectors. The algorithm has been implemented for a distributed memory parallel computer. The distributed sparse matrixvector product avoids global communications thanks to the initial setup of the communication pattern. The QR factorisation is distributed by using Givens rotations which require only local communications. Results on an Intel Paragon show the e#ciency and the scalability of our algorithm. Key words. GMRES, parallelism, sparse matrix, Newton basis. AMS subject classifications. 65F10, 65F25, 65F50. 1. Introduction. Many scientific applications make use of sparse linear algebra. Because they are quite time ...
Performance Analysis of the IQMR Method on Bulk Synchronous Parallel Architectures
, 1997
"... For the solutions of unsymmetric linear systems of equations, we have proposed an improved version of the quasiminimal residual (IQMR) method [21] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stabilit ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
For the solutions of unsymmetric linear systems of equations, we have proposed an improved version of the quasiminimal residual (IQMR) method [21] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a couple twoterm procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived such that all inner products and matrixvector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. In this paper, we use the Bulk Synchronous Parallel (BSP) model to design a fully efficient, scalable and portable parallel IQMR algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec GC/PowerPlus, and a cluster of workstations connected by an Ethernet. This performance model ...
Scientific Computing on Bulk Synchronous Parallel Architectures
, 1993
"... this paper we theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on BSP architectures. The computations considered include the iterative solution of sparse linear systems, molecular dynamics, linear programming, and t ..."
Abstract
 Add to MetaCart
this paper we theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on BSP architectures. The computations considered include the iterative solution of sparse linear systems, molecular dynamics, linear programming, and the solution of partial differential equations on a mesh. We analyse these computations in a uniform manner by formulating their basic procedures as a sparse matrixvector multiplication. In our analysis, we introduce the normalised BSP cost of an algorithm as an expression of the form
unknown title
"... On analysis of partitioning models and metrics in parallel sparse matrixvector multiplication ..."
Abstract
 Add to MetaCart
On analysis of partitioning models and metrics in parallel sparse matrixvector multiplication