Results 1  10
of
58
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 773 (26 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Adaptively Preconditioned Gmres Algorithms
 SIAM J. Sci. Comput
"... . The restarted GMRES algorithm proposed by Saad and Schultz [22] is one of the most popular iterative methods for the solution of large linear systems of equations Ax = b with a nonsymmetric and sparse matrix. This algorithm is particularly attractive when a good preconditioner is available. The pr ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
(Show Context)
. The restarted GMRES algorithm proposed by Saad and Schultz [22] is one of the most popular iterative methods for the solution of large linear systems of equations Ax = b with a nonsymmetric and sparse matrix. This algorithm is particularly attractive when a good preconditioner is available. The present paper describes two new methods for determining preconditioners from spectral information gathered by the Arnoldi process during iterations by the restarted GMRES algorithm. These methods seek to determine an invariant subspace of the matrix A associated with eigenvalues close to the origin, and move these eigenvalues so that a higher rate of convergence of the iterative methods is achieved. Key words. iterative method, nonsymmetric linear system, Arnoldi process AMS subject classifications. 65F10 1. Introduction. Many problems in Applied Mathematics and Engineering give rise to very large linear systems of equations Ax = b; A 2 R n\Thetan ; x; b 2 R n ; (1.1) with a sparse nons...
GMRES on (Nearly) Singular Systems
 SIAM J. Matrix Anal. Appl
, 1994
"... . We consider the behavior of the gmres method for solving a linear system Ax = b when A is singular or nearly so, i.e., illconditioned. The (near) singularity of A may or may not affect the performance of gmres, depending on the nature of the system and the initial approximate solution. For singu ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
(Show Context)
. We consider the behavior of the gmres method for solving a linear system Ax = b when A is singular or nearly so, i.e., illconditioned. The (near) singularity of A may or may not affect the performance of gmres, depending on the nature of the system and the initial approximate solution. For singular A, we give conditions under which the gmres iterates converge safely to a leastsquares solution or to the pseudoinverse solution. These results also apply to any residual minimizing Krylov subspace method that is mathematically equivalent to gmres. A practical procedure is outlined for efficiently and reliably detecting singularity or illconditioning when it becomes a threat to the performance of gmres. Key words. gmres method, residual minimizing methods, Krylov subspace methods, iterative linear algebra methods, singular or illconditioned linear systems AMS(MOS) subject classifications. 65F10 1. Introduction. The generalized minimal residual (gmres) method of Saad and Schultz [1...
Minimizing Communication in Sparse Matrix Solvers
"... Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparsematrixvectormultiplications and Ω(k) vecto ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
(Show Context)
Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparsematrixvectormultiplications and Ω(k) vector operations like dot products, resulting in communication that grows by a factor of Ω(k) in both the memory and network. By reorganizing the sparsematrix kernel to compute a set of matrixvector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k · log P) messages on a parallel machine, and reading the matrix A from DRAM to cache just once, instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to form a new variant of GMRES. Our sharedmemory implementation on an 8core Intel Clovertown gets speedups of up to 4.3 × over standard GMRES, without sacrificing convergence rate or numerical stability. 1.
Avoiding Communication in Sparse Matrix Computations
 In Proceedings of IPDPS
, 2008
"... 1 ..."
(Show Context)
BiCGstab(l) and other hybrid BiCG methods
 NUMERICAL ALGORITHMS 7(1994)75109
, 1994
"... It is wellknown that BiCG can be adapted so that the operations with A T can be avoided, and hybrid methods can be constructed in which it is attempted to further improve the convergence behaviour. Examples of this are CGS, BiCGSTAB, and the more general BiCGstab(l) method. In this paper it is sh ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
It is wellknown that BiCG can be adapted so that the operations with A T can be avoided, and hybrid methods can be constructed in which it is attempted to further improve the convergence behaviour. Examples of this are CGS, BiCGSTAB, and the more general BiCGstab(l) method. In this paper it is shown that BiCGstab(l) can be implemented in different ways. Each of the suggested approaches has its own advantages and disadvantages. Our implementations allow for combinations of BiCG with arbitrary polynomial methods. The choice for a specific implementation can also be made for reasons of numerical stability. This aspect receives much attention. Various effects have been illustrated by numerical examples.
Solving Sparse Least Squares Problems on Massively Distributed Memory Computers
 In Proceedings of International Conference on Advances in Parallel and Distributed Computing (APDC97
, 1997
"... In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified GramSchmidt (IMGS) preconditioner for solving sparse least squares prob ..."
Abstract

Cited by 14 (14 self)
 Add to MetaCart
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified GramSchmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We will describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations where communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented. 1. In...
Parallel Least Squares Problems On Massively Distributed Memory Computers
 Middle East Technical University
, 1996
"... In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified GramSchmidt preconditioner for solving least squares problems on massiv ..."
Abstract

Cited by 13 (13 self)
 Add to MetaCart
(Show Context)
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified GramSchmidt preconditioner for solving least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We will describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations where communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented. 1 Introduction The ...
Avoiding communication in computing krylov subspaces
, 2007
"... All rights reserved. ..."
(Show Context)