Results 1  10
of
29
PETSc users manual
 ANL95/11  Revision 2.1.0, Argonne National Laboratory
, 2001
"... tract W31109Eng38. 2 This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on highperformance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provid ..."
Abstract

Cited by 278 (20 self)
 Add to MetaCart
(Show Context)
tract W31109Eng38. 2 This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on highperformance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of largescale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all messagepassing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, and C++. PETSc provides many of the mechanisms needed within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of objectoriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper
Optimizing the performance of sparse matrixvector multiplication
, 2000
"... Copyright 2000 by EunJin Im ..."
(Show Context)
Optimizing Sparse Matrix Vector Multiplication on SMPs
 In Ninth SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... We describe optimizations of sparse matrixvector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
We describe optimizations of sparse matrixvector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two different graph algorithms. We present a performance study of this algorithmic kernel, showing how the optimization techniques affect absolute performance and scalability, how they interact with one another, and how the performance benefits depend on matrix structure.
PETSc users manual
 Tech. Rep. ANL95/11  Revision 2.1.5, Argonne National Laboratory
, 2004
"... This work was supported by the Mathematical, Information, and Computational Sciences ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
This work was supported by the Mathematical, Information, and Computational Sciences
A Relational Approach to the Compilation of Sparse Matrix Programs
 In Proceedings of EUROPAR
, 1997
"... . We present a relational algebra based framework for compiling efficient sparse matrix code from dense DOANY loops and a specification of the representation of the sparse matrix. We present experimental data that demonstrates that the code generated by our compiler achieves performance competitive ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
. We present a relational algebra based framework for compiling efficient sparse matrix code from dense DOANY loops and a specification of the representation of the sparse matrix. We present experimental data that demonstrates that the code generated by our compiler achieves performance competitive with that of handwritten codes for important computational kernels. 1 Introduction Sparse matrix computations are ubiquitous in computational science. However, the development of highperformance software for sparse matrix computations is a tedious and errorprone task, for two reasons. First, there are no standard ways of storing sparse matrices, since a variety of formats are used to avoid storing zeros, and the best choice for the format is dependent on the problem and the architecture. Second, for most algorithms, it takes a lot of code reorganization to produce an efficient sparse program that is tuned to a particular format. We illustrate these points by describing two formats  a...
An ObjectOriented Framework for Block Preconditioning
, 1998
"... General software for preconditioning the iterative solution of linear systems is greatly lagging behind the literature. This is partly because specific problems need specific matrix and preconditioner data structures in order to be solved efficiently; i.e., multiple implementations of a precondition ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
General software for preconditioning the iterative solution of linear systems is greatly lagging behind the literature. This is partly because specific problems need specific matrix and preconditioner data structures in order to be solved efficiently; i.e., multiple implementations of a preconditioner with specialized data structures are required. This article presents a framework to support preconditioning with various, possibly userdefined, data structures for matrices that are partitioned into blocks. The main idea is to define data structures for the blocks, and an upper layer of software which uses these blocks transparently of their data structure. This transparency can be accomplished by using an objectoriented language. Thus various preconditioners, such as block relaxations and block incomplete factorizations, only need to be defined once, and will work with any block type. In addition, it is possible to transparently interchange various approximate or exact techniques for inverting pivot blocks, or solving systems whose coefficient matrices are diagonal blocks. This leads to a rich variety of preconditioners that can be selected. Operations with the blocks are performed with optimized libraries or fundamental data types. Comparisons with an optimized Fortran 77 code on both workstations and Cray supercomputers show that this framework can approach the efficiency of Fortran 77, as long as suitable block sizes and block types are chosen.
A Parallel Lanczos Method for Symmetric Generalized Eigenvalue Problems
, 1997
"... Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, we present our parallel version of the Lanczos method for symmetric generalized eigenvalue p ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, we present our parallel version of the Lanczos method for symmetric generalized eigenvalue problem, PLANSO. PLANSO is based on a sequential package called LANSO which implements the Lanczos algorithm with partial reorthogonalization. It is portable to all parallel machines that support MPI and easy to interface with most parallel computing packages. Through numerical experiments, we demonstrate that it achieves similar parallel efficiency as PARPACK, but uses considerably less time. The Lanczos algorithm is one of the most commonly used methods for finding extreme eigenvalues of large sparse symmetric matrices [2, 3, 13, 15]. A number of sequential implementations of this algorithm are freely available from various sources, for example, NETLIB (http://www.netlib.org/) and ACM TO...
Domain Decomposition and MultiLevel Type Techniques for General Sparse Linear Systems
, 1998
"... Domaindecomposition and multilevel techniques are often formulated for linear systems that arise from the solution of elliptictype Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting ..."
Abstract

Cited by 15 (15 self)
 Add to MetaCart
Domaindecomposition and multilevel techniques are often formulated for linear systems that arise from the solution of elliptictype Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting common approach used to derive successful preconditioners is to resort to Schur complements. In particular, we discuss a multilevel domain decompositiontype algorithm for iterative solution of large sparse linear systems based on independent subsets of nodes. We also discuss a Schur complement technique that utilizes incomplete LU factorizations of local matrices. Key words: Schur complement techniques; Incomplete LU factorization; Schwarz iterations; Multielimination; Multilevel ILU preconditioners; Krylov subspace methods. 1 Introduction A recent trend in parallel preconditioning techniques for general sparse linear systems is to exploit ideas from domain decomposition concepts an...
Compiling Parallel Code for Sparse Matrix Applications
 In Supercomputing
, 1997
"... We have developed a framework based on relational algebra for compiling efficient sparse matrix code from dense DOANY loops and a specification of the representation of the sparse matrix. In this paper, we show how this framework can be used to generate parallel code, and present experimental data ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We have developed a framework based on relational algebra for compiling efficient sparse matrix code from dense DOANY loops and a specification of the representation of the sparse matrix. In this paper, we show how this framework can be used to generate parallel code, and present experimental data that demonstrates that the code generated by our Bernoulli compiler achieves performance competitive with that of handwritten codes for important computational kernels. Keywords: parallelizing compilers, sparse matrix computations 1 Introduction Sparse matrix computations are ubiquitous in computational science. However, the development of highperformance software for sparse matrix computations is a tedious and errorprone task, for two reasons. First, there is no standard way of storing sparse matrices, since a variety of formats are used to avoid storing zeros, and the best choice for the format is dependent on the problem and the architecture. Second, for most algorithms, it takes a lo...
The Matrix Template Library: A Unifying Framework for Numerical Linear Algebra
 In Parallel Object Oriented Scientific Computing. ECOOP
, 1998
"... . We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming tec ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
. We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. 1 Introduction The traditional approach to writing basic linear algebra routines is a combinatorial a#air. There are typically four precision types that need to be handled #single and double precision real, single and double precision complex#, several dense storage types #general, banded, packed#, a multitude of sparse storage types #the Sparse BLAS Standard Proposal includes 13 #1##, as well as row and co...