Results 1  10
of
21
Applied Numerical Linear Algebra
 Society for Industrial and Applied Mathematics
, 1997
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate ..."
Abstract

Cited by 532 (26 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Krylov subspace methods on supercomputers
 SIAM J. SCI. STAT. COMPUT
, 1989
"... This paper presents a short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three dimensional model ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
This paper presents a short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel / vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. We describe in detail a few approaches consisting of implementing efficient forward and backward triangular solutions. Then we discuss polynomial preconditioning as an alternative to standard incomplete factorization techniques. Another efficient approach is to reorder the equations so as improve the structure of the matrix to achieve better parallelism or vectorization. We give an overview of these ideas and others and attempt to comment on their effectiveness or potential for different types of architectures.
Vaidya's Preconditioners: Implementation And Experimental Study
, 2001
"... We describe the implementation and performance of a novel class of preconditioners. These preconditioners were proposed and theoretically analyzed by Pravin Vaidya in 1991, but no report on their implementation or performance in practice has ever been published. We show experimentally that these pre ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
We describe the implementation and performance of a novel class of preconditioners. These preconditioners were proposed and theoretically analyzed by Pravin Vaidya in 1991, but no report on their implementation or performance in practice has ever been published. We show experimentally that these preconditioners have some remarkable properties. We show that within the class of diagonallydominant symmetric matrices, the cost and convergence of these preconditioners depends almost only on the nonzero structure of the matrix, but not on its numerical values. In particular, this property leads to robust convergence behavior on di#cult 3dimensional problems that cause stagnation in incompleteCholesky preconditioners (more specifically, in droptolerance incomplete Cholesky without diagonal modification, with diagonal modification, and with relaxed diagonal modification). On such problems, we have observed cases in which a Vaidyapreconditioned solver is more than 6 times faster than an incompleteCholeskypreconditioned solver, when we allow similar amounts of fill in the factors of both preconditioners. We also show that Vaidya's preconditioners perform and scale similarly or better than droptolerance relaxedmodified incomplete Cholesky preconditioners on a wide range of 2dimensional problems. In particular, on anisotropic 2D problems, Vaidya delivers robust convergence independently of the direction of anisotropy and the ordering of the unknowns. However, on many 3D problems in which incompleteCholeskypreconditioned solvers converge without stagnating, Vaidyapreconditioned solvers are much slower. We also show how the insights gained from this study can be used to design faster and more robust solvers for some di#cult problems. 1.
A Parallel Preconditioned Conjugate Gradient Package for Solving Sparse Linear Systems on a Cray YMP
 Appl. Num. Math
, 1991
"... In this paper we discuss current activities at Cray Research to develop generalpurpose, productionquality software for the efficient solution of sparse linear systems. ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
In this paper we discuss current activities at Cray Research to develop generalpurpose, productionquality software for the efficient solution of sparse linear systems.
VLUGR2: A Vectorized Local Uniform Grid Refinement Code for PDEs in 2D
 in 2D. Report NMR9306
, 1993
"... This paper describes an ANSI FORTRAN 77 code, VLUGR2, vectorized for the Cray YMP, that is based on an adaptivegrid finitedifference method to solve timedependent twodimensional systems of partial differential equations. 1991 Mathematics Subject Classification: Primary: 6504. Secondary: 65M20, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper describes an ANSI FORTRAN 77 code, VLUGR2, vectorized for the Cray YMP, that is based on an adaptivegrid finitedifference method to solve timedependent twodimensional systems of partial differential equations. 1991 Mathematics Subject Classification: Primary: 6504. Secondary: 65M20, 65M50, 65F10. 1991 CR Categories: G1.8. Keywords & Phrases: software, partial differential equations, method of lines, adaptive grid methods, nonsymmetric sparse linear systems, iterative solvers, vectorization. Note: This work was supported by Cray Research, Inc., under grant CRG 92.05, via the Stichting Nationale Computerfaciliteiten (National Computing Facilities Foundation, NCF). 1. Introduction In [16, 14, 15, 18, 17, 19] an adaptivegrid finitedifference method is studied to solve timedependent twodimensional systems of partial differential equations (PDEs). Among others, a code, MOORKOP[13], has been developed which uses an implicit timestepping method. In this paper we desc...
Approximate And Incomplete Factorizations
 ICASE/LARC INTERDISCIPLINARY SERIES IN SCIENCE AND ENGINEERING
, 1994
"... In this chapter, we give a brief overview of a particular class of preconditioners known as incomplete factorizations. They can be thought of as approximating the exact LU factorization of a given matrix A (e.g. computed via Gaussian elimination) by disallowing certain fillins. As opposed to other ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In this chapter, we give a brief overview of a particular class of preconditioners known as incomplete factorizations. They can be thought of as approximating the exact LU factorization of a given matrix A (e.g. computed via Gaussian elimination) by disallowing certain fillins. As opposed to other PDEbased preconditioners such as multigrid and domain decomposition, this class of preconditioners are primarily algebraic in nature and can in principle be applied to any sparse matrices. When applied to PDE problems, they are usually not optimal in the sense that the condition number of the preconditioned system will grow as the mesh size h is reduced, although usually at a slower rate than for the unpreconditioned system. On the other hand, they are often quite robust with respect to other more algebraic features of the problem such as rough and anisotropic coefficients and strong convection terms. We will describe the basic ILU and (modified) MILU preconditioners. Then we will review ...
VLUGR3: A Vectorizable Adaptive Grid Solver for PDEs in 3D. I. Algorithmic Aspects and Applications
 APPL. NUMER. MATH
, 1994
"... This paper describes an adaptivegrid finitedifference solver for timedependent threedimensional systems of partial differential equations. The robustness and the efficiency of the solver, both for vector and scalar processors, is illustrated by the application of the code to three example prob ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper describes an adaptivegrid finitedifference solver for timedependent threedimensional systems of partial differential equations. The robustness and the efficiency of the solver, both for vector and scalar processors, is illustrated by the application of the code to three example problems.
Are There Iterative BLAS?
, 1994
"... A technique for optimizing software is proposed that involves the use of a standardized set of computational kernels that are common to many iterative methods for solving large sparse linear systems of equations. These kernels, referred to as "Iterative Basic Linear Algebra Subprograms" or "Iterativ ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A technique for optimizing software is proposed that involves the use of a standardized set of computational kernels that are common to many iterative methods for solving large sparse linear systems of equations. These kernels, referred to as "Iterative Basic Linear Algebra Subprograms" or "Iterative BLAS", are defined and techniques for their optimization on vector computers are presented. Several sparse matrix storage formats for different classes of matrix problems are proposed that allow the vectorization of fundamental operations in various iterative methods using these kernels. 1 Introduction Many iterative methods perform operations that can be easily optimized on most vector computers, such as the dot product of two vectors and the updating of a vector using another vector. These operations are often used in linear algebra applications, and they have been denoted as Basic Linear Algebra Subprograms or BLAS [23]. In the BLAS library, the calling sequences of these primitive vec...
Preconditioning and Parallel Preconditioning
, 1998
"... We review current methods for preconditioning systems of equations for their solution using iterative methods. We consider the solution of unsymmetric as well as symmetric systems and discuss techniques and implementations that exploit parallelism. We particularly study preconditioning techniques ba ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We review current methods for preconditioning systems of equations for their solution using iterative methods. We consider the solution of unsymmetric as well as symmetric systems and discuss techniques and implementations that exploit parallelism. We particularly study preconditioning techniques based on incomplete LU factorization, sparse approximate inverses, polynomial preconditioning, and block and element by element preconditioning. In the parallel implementation, we consider the effect of reordering. Keywords: preconditioning, parallel computers, sparse matrices, incomplete factorization, sparse approximate inverses, block methods, element by element preconditioning. AMS(MOS) subject classifications: 65F05, 65F50. 1 This paper is a preprint of a Chapter of the book "Numerical Linear Algebra for HighPerformance Computers" by Dongarra, Duff, Sorensen, and van der Vorst that will be published by SIAM Press. Also appeared as Technical Report RALTR1998052 from Rutherford Apple...
LECTURE 9: 12/7/93 Iterative methods
, 1993
"... is implemented on a MCC MPP with several unknowns per processor. ffl Jacobi and GS are not effective. They require O(N) iterations for convergence compared to O(N 1=2 ) for SOR. Under suitable orderings all methods make use of local information on a meshconnected processor. ffl Except for mode ..."
Abstract
 Add to MetaCart
is implemented on a MCC MPP with several unknowns per processor. ffl Jacobi and GS are not effective. They require O(N) iterations for convergence compared to O(N 1=2 ) for SOR. Under suitable orderings all methods make use of local information on a meshconnected processor. ffl Except for model problems, SOR requires parameter estimation procedures [HY81]. These require global information in each iteration, implying O(N 1=2 ) communication per step. Hence the total SOR complexity becomes O(N), losing the advantage. A local relaxation scheme was proposed by Ehrlich in [Ehr81]. CS454: December 7, 1993 [ 5 ] Local relaxation Red points (i + j) even: u<F14.