Results 1 - 10
of
22
Optimizing the performance of sparse matrix-vector multiplication
, 2000
"... Copyright 2000 by Eun-Jin Im ..."
Optimizing Sparse Matrix Vector Multiplication on SMPs
- In Ninth SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two different graph algorithms. We present a performance study of this algorithmic kernel, showing how the optimization techniques affect absolute performance and scalability, how they interact with one another, and how the performance benefits depend on matrix structure.
Domain Decomposition and Multi-Level Type Techniques for General Sparse Linear Systems
, 1998
"... Domain-decomposition and multi-level techniques are often formulated for linear systems that arise from the solution of elliptic-type Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting ..."
Abstract
-
Cited by 17 (16 self)
- Add to MetaCart
Domain-decomposition and multi-level techniques are often formulated for linear systems that arise from the solution of elliptic-type Partial Differential Equations. In this paper, generalizations of these techniques for irregularly structured sparse linear systems are considered. An interesting common approach used to derive successful preconditioners is to resort to Schur complements. In particular, we discuss a multi-level domain decompositiontype algorithm for iterative solution of large sparse linear systems based on independent subsets of nodes. We also discuss a Schur complement technique that utilizes incomplete LU factorizations of local matrices. Key words: Schur complement techniques; Incomplete LU factorization; Schwarz iterations; Multielimination; Multi-level ILU preconditioners; Krylov subspace methods. 1 Introduction A recent trend in parallel preconditioning techniques for general sparse linear systems is to exploit ideas from domain decomposition concepts an...
A Relational Approach to the Compilation of Sparse Matrix Programs
- In Proceedings of EUROPAR
, 1997
"... . We present a relational algebra based framework for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. We present experimental data that demonstrates that the code generated by our compiler achieves performance competitive ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
. We present a relational algebra based framework for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. We present experimental data that demonstrates that the code generated by our compiler achieves performance competitive with that of hand-written codes for important computational kernels. 1 Introduction Sparse matrix computations are ubiquitous in computational science. However, the development of high-performance software for sparse matrix computations is a tedious and error-prone task, for two reasons. First, there are no standard ways of storing sparse matrices, since a variety of formats are used to avoid storing zeros, and the best choice for the format is dependent on the problem and the architecture. Second, for most algorithms, it takes a lot of code reorganization to produce an efficient sparse program that is tuned to a particular format. We illustrate these points by describing two formats --- a...
A Parallel Lanczos Method for Symmetric Generalized Eigenvalue Problems
, 1997
"... Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, we present our parallel version of the Lanczos method for symmetric generalized eigenvalue p ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, we present our parallel version of the Lanczos method for symmetric generalized eigenvalue problem, PLANSO. PLANSO is based on a sequential package called LANSO which implements the Lanczos algorithm with partial re-orthogonalization. It is portable to all parallel machines that support MPI and easy to interface with most parallel computing packages. Through numerical experiments, we demonstrate that it achieves similar parallel efficiency as PARPACK, but uses considerably less time. The Lanczos algorithm is one of the most commonly used methods for finding extreme eigenvalues of large sparse symmetric matrices [2, 3, 13, 15]. A number of sequential implementations of this algorithm are freely available from various sources, for example, NETLIB (http://www.netlib.org/) and ACM TO...
An Object-Oriented Framework for Block Preconditioning
, 1998
"... General software for preconditioning the iterative solution of linear systems is greatly lagging behind the literature. This is partly because specific problems need specific matrix and preconditioner data structures in order to be solved efficiently; i.e., multiple implementations of a precondition ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
General software for preconditioning the iterative solution of linear systems is greatly lagging behind the literature. This is partly because specific problems need specific matrix and preconditioner data structures in order to be solved efficiently; i.e., multiple implementations of a preconditioner with specialized data structures are required. This article presents a framework to support preconditioning with various, possibly user-defined, data structures for matrices that are partitioned into blocks. The main idea is to define data structures for the blocks, and an upper layer of software which uses these blocks transparently of their data structure. This transparency can be accomplished by using an object-oriented language. Thus various preconditioners, such as block relaxations and block incomplete factorizations, only need to be defined once, and will work with any block type. In addition, it is possible to transparently interchange various approximate or exact techniques for inverting pivot blocks, or solving systems whose coefficient matrices are diagonal blocks. This leads to a rich variety of preconditioners that can be selected. Operations with the blocks are performed with optimized libraries or fundamental data types. Comparisons with an optimized Fortran 77 code on both workstations and Cray supercomputers show that this framework can approach the efficiency of Fortran 77, as long as suitable block sizes and block types are chosen.
Compiling Parallel Code for Sparse Matrix Applications
- In Supercomputing
, 1997
"... We have developed a framework based on relational algebra for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. In this paper, we show how this framework can be used to generate parallel code, and present experimental data ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We have developed a framework based on relational algebra for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. In this paper, we show how this framework can be used to generate parallel code, and present experimental data that demonstrates that the code generated by our Bernoulli compiler achieves performance competitive with that of hand-written codes for important computational kernels. Keywords: parallelizing compilers, sparse matrix computations 1 Introduction Sparse matrix computations are ubiquitous in computational science. However, the development of high-performance software for sparse matrix computations is a tedious and error-prone task, for two reasons. First, there is no standard way of storing sparse matrices, since a variety of formats are used to avoid storing zeros, and the best choice for the format is dependent on the problem and the architecture. Second, for most algorithms, it takes a lo...
The Matrix Template Library: A Unifying Framework for Numerical Linear Algebra
- In Parallel Object Oriented Scientific Computing. ECOOP
, 1998
"... . We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming tec ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
. We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. 1 Introduction The traditional approach to writing basic linear algebra routines is a combinatorial a#air. There are typically four precision types that need to be handled #single and double precision real, single and double precision complex#, several dense storage types #general, banded, packed#, a multitude of sparse storage types #the Sparse BLAS Standard Proposal includes 13 #1##, as well as row and co...
Model-Based Memory Hierarchy Optimizations for Sparse Matrices
- In Workshop on Profile and Feedback-Directed Compilation
, 1998
"... Sparse matrix-vector multiplication is an important computational kernel used in numerical algorithms. It tends to run much more slowly than its dense counterpart, and its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper we ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Sparse matrix-vector multiplication is an important computational kernel used in numerical algorithms. It tends to run much more slowly than its dense counterpart, and its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper we address the problem of optimizing sparse matrix-vector multiplication for the memory hierarchies that exist on modern machines and how machinespecific or matrix-specific profiling information can be used to decide which optimizations should be applied and what parameters should be used. We also consider a variation of the problem in which a matrix is multiplied by a set of vectors. Performance is measured on a 167 MHz Ultrasparc I, 200 MHz Pentium Pro, and 450 MHz DEC Alpha 21164. Experiments show these optimization techniques to have significant payoff, although the effectiveness of each depends on the matrix structure and machine. 1 Introduction Matrix-vector multiplication is an importa...
Non-Standard Parallel Solution Strategies for Distributed Sparse Linear Systems
, 1999
"... A number of techniques are described for solving sparse linear systems on parallel platforms. The general approach used is a domain decomposition type method in which a processor is assigned a certain number of rows of the linear system to be solved. Strategies that are discussed include non-sta ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
A number of techniques are described for solving sparse linear systems on parallel platforms. The general approach used is a domain decomposition type method in which a processor is assigned a certain number of rows of the linear system to be solved. Strategies that are discussed include non-standard graph partitioners, and a forced load-balance technique for the local iterations. A common practice when partitioning a graph is to seek to minimize the number of cut-edges and to have an equal number of equations per processor. It is shown that partitioners that takeinto account the values of the matrix entries may be more effective.

