Results 1 - 10
of
18
Parallel Numerical Linear Algebra
- Society for Industrial and Applied Mathematics
, 1997
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract
-
Cited by 418 (24 self)
- Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, the singular value decomposition, and generalizations of these to two matrices. We consider dense, band and sparse matrices.
Krylov subspace methods on supercomputers
- SIAM J. SCI. STAT. COMPUT
, 1989
"... This paper presents a short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three dimensional model ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
This paper presents a short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel / vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. We describe in detail a few approaches consisting of implementing efficient forward and backward triangular solutions. Then we discuss polynomial preconditioning as an alternative to standard incomplete factorization techniques. Another efficient approach is to reorder the equations so as improve the structure of the matrix to achieve better parallelism or vectorization. We give an overview of these ideas and others and attempt to comment on their effectiveness or potential for different types of architectures.
Vaidya's Preconditioners: Implementation And Experimental Study
, 2001
"... We describe the implementation and performance of a novel class of preconditioners. These preconditioners were proposed and theoretically analyzed by Pravin Vaidya in 1991, but no report on their implementation or performance in practice has ever been published. We show experimentally that these pre ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We describe the implementation and performance of a novel class of preconditioners. These preconditioners were proposed and theoretically analyzed by Pravin Vaidya in 1991, but no report on their implementation or performance in practice has ever been published. We show experimentally that these preconditioners have some remarkable properties. We show that within the class of diagonally-dominant symmetric matrices, the cost and convergence of these preconditioners depends almost only on the nonzero structure of the matrix, but not on its numerical values. In particular, this property leads to robust convergence behavior on di#cult 3-dimensional problems that cause stagnation in incomplete-Cholesky preconditioners (more specifically, in drop-tolerance incomplete Cholesky without diagonal modification, with diagonal modification, and with relaxed diagonal modification). On such problems, we have observed cases in which a Vaidya-preconditioned solver is more than 6 times faster than an incomplete-Cholesky-preconditioned solver, when we allow similar amounts of fill in the factors of both preconditioners. We also show that Vaidya's preconditioners perform and scale similarly or better than drop-tolerance relaxed-modified incomplete Cholesky preconditioners on a wide range of 2-dimensional problems. In particular, on anisotropic 2D problems, Vaidya delivers robust convergence independently of the direction of anisotropy and the ordering of the unknowns. However, on many 3D problems in which incomplete-Choleskypreconditioned solvers converge without stagnating, Vaidya-preconditioned solvers are much slower. We also show how the insights gained from this study can be used to design faster and more robust solvers for some di#cult problems. 1.
A Parallel Preconditioned Conjugate Gradient Package for Solving Sparse Linear Systems on a Cray Y-MP
- Appl. Num. Math
, 1991
"... In this paper we discuss current activities at Cray Research to develop generalpurpose, production-quality software for the efficient solution of sparse linear systems. ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper we discuss current activities at Cray Research to develop generalpurpose, production-quality software for the efficient solution of sparse linear systems.
Approximate And Incomplete Factorizations
- ICASE/LARC INTERDISCIPLINARY SERIES IN SCIENCE AND ENGINEERING
, 1994
"... In this chapter, we give a brief overview of a particular class of preconditioners known as incomplete factorizations. They can be thought of as approximating the exact LU factorization of a given matrix A (e.g. computed via Gaussian elimination) by disallowing certain fill-ins. As opposed to other ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In this chapter, we give a brief overview of a particular class of preconditioners known as incomplete factorizations. They can be thought of as approximating the exact LU factorization of a given matrix A (e.g. computed via Gaussian elimination) by disallowing certain fill-ins. As opposed to other PDE-based preconditioners such as multigrid and domain decomposition, this class of preconditioners are primarily algebraic in nature and can in principle be applied to any sparse matrices. When applied to PDE problems, they are usually not optimal in the sense that the condition number of the preconditioned system will grow as the mesh size h is reduced, although usually at a slower rate than for the unpreconditioned system. On the other hand, they are often quite robust with respect to other more algebraic features of the problem such as rough and anisotropic coefficients and strong convection terms. We will describe the basic ILU and (modified) MILU preconditioners. Then we will review ...
VLUGR3: A Vectorizable Adaptive Grid Solver for PDEs in 3D. I. Algorithmic Aspects and Applications
- APPL. NUMER. MATH
, 1994
"... This paper describes an adaptive-grid finite-difference solver for time-dependent three-dimensional systems of partial differential equations. The robustness and the efficiency of the solver, both for vector and scalar processors, is illustrated by the application of the code to three example prob ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes an adaptive-grid finite-difference solver for time-dependent three-dimensional systems of partial differential equations. The robustness and the efficiency of the solver, both for vector and scalar processors, is illustrated by the application of the code to three example problems.
VLUGR2: A Vectorized Local Uniform Grid Refinement Code for PDEs in 2D
- in 2D. Report NM-R9306
, 1993
"... This paper describes an ANSI FORTRAN 77 code, VLUGR2, vectorized for the Cray YMP, that is based on an adaptive-grid finite-difference method to solve time-dependent two-dimensional systems of partial differential equations. 1991 Mathematics Subject Classification: Primary: 65-04. Secondary: 65M20, ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper describes an ANSI FORTRAN 77 code, VLUGR2, vectorized for the Cray YMP, that is based on an adaptive-grid finite-difference method to solve time-dependent two-dimensional systems of partial differential equations. 1991 Mathematics Subject Classification: Primary: 65-04. Secondary: 65M20, 65M50, 65F10. 1991 CR Categories: G1.8. Keywords & Phrases: software, partial differential equations, method of lines, adaptive grid methods, nonsymmetric sparse linear systems, iterative solvers, vectorization. Note: This work was supported by Cray Research, Inc., under grant CRG 92.05, via the Stichting Nationale Computerfaciliteiten (National Computing Facilities Foundation, NCF). 1. Introduction In [16, 14, 15, 18, 17, 19] an adaptive-grid finite-difference method is studied to solve time-dependent two-dimensional systems of partial differential equations (PDEs). Among others, a code, MOORKOP[13], has been developed which uses an implicit time-stepping method. In this paper we desc...
Are There Iterative BLAS?
, 1994
"... A technique for optimizing software is proposed that involves the use of a standardized set of computational kernels that are common to many iterative methods for solving large sparse linear systems of equations. These kernels, referred to as "Iterative Basic Linear Algebra Subprograms" or "Iterativ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A technique for optimizing software is proposed that involves the use of a standardized set of computational kernels that are common to many iterative methods for solving large sparse linear systems of equations. These kernels, referred to as "Iterative Basic Linear Algebra Subprograms" or "Iterative BLAS", are defined and techniques for their optimization on vector computers are presented. Several sparse matrix storage formats for different classes of matrix problems are proposed that allow the vectorization of fundamental operations in various iterative methods using these kernels. 1 Introduction Many iterative methods perform operations that can be easily optimized on most vector computers, such as the dot product of two vectors and the updating of a vector using another vector. These operations are often used in linear algebra applications, and they have been denoted as Basic Linear Algebra Subprograms or BLAS [23]. In the BLAS library, the calling sequences of these primitive vec...
Preconditioning and Parallel Preconditioning
, 1998
"... We review current methods for preconditioning systems of equations for their solution using iterative methods. We consider the solution of unsymmetric as well as symmetric systems and discuss techniques and implementations that exploit parallelism. We particularly study preconditioning techniques ba ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We review current methods for preconditioning systems of equations for their solution using iterative methods. We consider the solution of unsymmetric as well as symmetric systems and discuss techniques and implementations that exploit parallelism. We particularly study preconditioning techniques based on incomplete LU factorization, sparse approximate inverses, polynomial preconditioning, and block and element by element preconditioning. In the parallel implementation, we consider the effect of reordering. Keywords: preconditioning, parallel computers, sparse matrices, incomplete factorization, sparse approximate inverses, block methods, element by element preconditioning. AMS(MOS) subject classifications: 65F05, 65F50. 1 This paper is a preprint of a Chapter of the book "Numerical Linear Algebra for High-Performance Computers" by Dongarra, Duff, Sorensen, and van der Vorst that will be published by SIAM Press. Also appeared as Technical Report RAL-TR-1998052 from Rutherford Apple...
LECTURE 9: 12/7/93 Iterative methods
, 1993
"... is implemented on a MCC MPP with several unknowns per processor. ffl Jacobi and GS are not effective. They require O(N) iterations for convergence compared to O(N 1=2 ) for SOR. Under suitable orderings all methods make use of local information on a mesh-connected processor. ffl Except for mode ..."
Abstract
- Add to MetaCart
is implemented on a MCC MPP with several unknowns per processor. ffl Jacobi and GS are not effective. They require O(N) iterations for convergence compared to O(N 1=2 ) for SOR. Under suitable orderings all methods make use of local information on a mesh-connected processor. ffl Except for model problems, SOR requires parameter estimation procedures [HY81]. These require global information in each iteration, implying O(N 1=2 ) communication per step. Hence the total SOR complexity becomes O(N), losing the advantage. A local relaxation scheme was proposed by Ehrlich in [Ehr81]. CS454: December 7, 1993 [ 5 ] Local relaxation Red points (i + j) even: u<F14.

