Results 1  10
of
12
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 766 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Developments and Trends in the Parallel Solution of Linear Systems
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.
On the Portability and Efficiency of Parallel Algorithms and Software
 Delft University of Technology
, 1994
"... Parallel software development must face the fact that different architectures require different implementations. Flexibility in modifying parallel methods and software is necessary because the efficiency of algorithms is dependent on the characteristics of the target computer. Furthermore different ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Parallel software development must face the fact that different architectures require different implementations. Flexibility in modifying parallel methods and software is necessary because the efficiency of algorithms is dependent on the characteristics of the target computer. Furthermore different parallel computers require different implementations of data in datastructures. The required flexibility is obtained by identifying abstraction levels and development steps in parallel algorithm and software development. The approach that is proposed ensures that all choices in the design are properly recognised and documented. As a result it is simple to compare the characteristics of a new parallel computer with the characteristics that are used in the software. In this way the development itself becomes more portable and thus less architecture dependent. 1 Introduction A main task of parallel software development is to obtain highly efficient and portable software for parallel computers...
Parallel iterative solution methods for linear systems arising from discretized PDE's
 Lecture Notes on Parallel Iterative Methods for discretized PDE's. AGARD Special Course on Parallel Computing in CFD, available from http://www.math.ruu.nl/people/vorst/#lec
, 1995
"... In these notes we will present anoverview of a number of related iterative methods for the solution of linear systems of equations. These methods are socalled Krylov projection type methods and they include popular methods as Conjugate Gradients, BiConjugate Gradients, CGS, BiCGSTAB, QMR, LSQR an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In these notes we will present anoverview of a number of related iterative methods for the solution of linear systems of equations. These methods are socalled Krylov projection type methods and they include popular methods as Conjugate Gradients, BiConjugate Gradients, CGS, BiCGSTAB, QMR, LSQR and GMRES. We will showhow these methods can be derived from simple basic iteration formulas. We will not give convergence proofs, but we will refer for these, as far as available, to litterature. Iterative methods are often used in combination with socalled preconditioning operators (approximations for the inverses of the operator of the system to be solved). Since these preconditioners are not essential in the derivation of the iterative methods, we will not givemuch attention to them in these notes. However, in most of the actual iteration schemes, we have included them in order to facilitate the use of these schemes in actual computations. For the application of the iterative schemes one usually thinks of linear sparse systems, e.g., like those arising in the nite element or nite di erence approximations of (systems of) partial di erential equations. However, the structure of the operators plays no explicit role in any oftheseschemes, and these schemes might also successfully be used to solve certain large dense linear systems. Depending on the situation that might be attractive in terms of numbers of oating point operations. It will turn out that all of the iterative are parallelizable in a straight forward manner. However, especially for computers with a memory hierarchy (i.e., like cache or vector registers), and for distributed memory computers, the performance can often be improved signi cantly through rescheduling of the operations. We will discuss parallel implementations, and occasionally we will report on experimental ndings.
Execution Time Analysis for Least Squares Problems on Massively Parallel Distributed Memory Computers
 In Proceedings of International Conference on Computational Modeling and Computing (CMCP96
, 1996
"... . In this paper we mainly focus on the study of the parallelization of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Based on the data distribution model, we analyze fully the most suitable ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
. In this paper we mainly focus on the study of the parallelization of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Based on the data distribution model, we analyze fully the most suitable communication network topology for solving least squares problems on massively distributed memory computers. A theoretical model of communication phases is presented which allows us to give a detail execution time complexity analysis and investigates its usefulness. It is shown that the implementation of PCGLS, with a rowblock decomposition of the coefficient matrix, on a ring of communication structure is the most efficient choice. Performance tests of the developed parallel PCGLS algorithm have been carried out on the massively distributed memory system ParsytecGC/PowerPlus and experimental timing results are compared with the theoretical execution time complexity analysis. 1 Introductio...
Lecture Notes on Iterative Methods
, 1994
"... Introduction In these notes we will present an overview of a number of related iterative methods for the solution of linear systems of equations. These methods are socalled Krylov projection type methods and they include popular methods as Conjugate Gradients, BiConjugate Gradients, LSQR and GMRE ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Introduction In these notes we will present an overview of a number of related iterative methods for the solution of linear systems of equations. These methods are socalled Krylov projection type methods and they include popular methods as Conjugate Gradients, BiConjugate Gradients, LSQR and GMRES. We will show how these methods can be derived from simple basic iteration formulas. We will not give convergence proofs, but we will refer for these, as far as available, to litterature. Iterative methods are often used in combination with socalled preconditioning operators (approximations for the inverses of the operator of the system to be solved). Since these preconditioners are not essential in the derivation of these iterative methods, we will not discuss on them explicitly in these notes. However, in most of the actual iteration schemes, we have included them in order to facilitate the use of these schemes in actual computations. For the application of the iterative schemes
Data Distribution And Communication Schemes For Least Squares Problems On Massively Distributed Memory Computers
 In Proceedings of International Conference on Computational Modelling
, 1996
"... In this paper we study the parallelization of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Two important schemes are discussed. What is the best possible data distribution and which com ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In this paper we study the parallelization of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Two important schemes are discussed. What is the best possible data distribution and which communication network topology is most suitable for solving least squares problems on massively distributed memory computers. A theoretical model of data distribution and communication phases is presented which allows us to give a detail execution time complexity analysis and investigates its usefulness. It is shown that the implementation of PCGLS, with a rowblock decomposition of the coefficient matrix, on a ring of communication structure is the most efficient choice. Performance tests of the developed parallel PCGLS algorithm have been carried out on the massively distributed memory system ParsytecGC/PowerPlus and experimental timing results are compared with the theoretical execut...
Communication cost reduction for Krylov methods on parallel computers
 Proc. of HighPerformance Computing and Networking Conference
"... Abstract. On large distributed memory parallel computers the global communication cost of inner products seriously limits the performance of Krylov subspace methods [3]. We consider improved algorithms to reduce this communication overhead, and we analyze the performance by experiments on a 400proc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. On large distributed memory parallel computers the global communication cost of inner products seriously limits the performance of Krylov subspace methods [3]. We consider improved algorithms to reduce this communication overhead, and we analyze the performance by experiments on a 400processor parallel computer and with a simple performance model. 1
Implementation Aspects
"... e inner products, vector updates and matrix vector product are easily parallelized and vectorized. The more successful preconditionings, i.e, based upon incomplete LU decomposition, are not easily parallelizable. For that reason one is often satisfied with the use of only diagonal scaling as a preco ..."
Abstract
 Add to MetaCart
e inner products, vector updates and matrix vector product are easily parallelized and vectorized. The more successful preconditionings, i.e, based upon incomplete LU decomposition, are not easily parallelizable. For that reason one is often satisfied with the use of only diagonal scaling as a preconditioner on highly parallel computers, such as the CM2 [24]. On distributed memory computers we need large grained parallelism in order to reduce synchronization overhead. This can be achieved by combining the work required for a successive number of iteration steps. The idea is to construct first in parallel a straight forward Krylov basis for the search subspace in which an update for the current solution will be determined. Once this basis has been computed, the vectors are orthogonalized, as is done in Krylov subspace methods. The construction as well as the orthogonalization can be done with large grained parallelism, and has su#cient degree of parallelism in it. This approach has be