Results 1 -
9 of
9
Highly scalable parallel algorithms for sparse matrix factorization
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract
-
Cited by 100 (29 self)
- Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge,
Improving Memory-System Performance of Sparse Matrix-Vector Multiplication
- IBM Journal of Research and Development
, 1997
"... Sparse Matrix-Vector Multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describe techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses originally due to Das ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
Sparse Matrix-Vector Multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describe techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses originally due to Das et al., blocking to reduce load instructions, and prefetching to prevent multiple load-store units from stalling simulteneously. The techniques improve performnance from about 40 Mflops (on a well-ordered matrix) to over 100 Mflops on a 266 Mflops machine. The techniques are applicable to other superscalar RISC processors as well and have improved performance on a Sun UltraSparc I workstation, for example. 1 Introduction Sparse matrix-vector multiplication is an important computational kernel in many iterative linear solvers (see [5], for example). Unfortunately, on many computers this kernel runs slowly relative to other numerical codes, such as dense matrix computations. This paper propos...
Fast and Effective Algorithms for Graph Partitioning and Sparse Matrix Ordering
- IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1996
"... Graph partitioning is a fundamental problem in several scientific and engineering applications. In this paper, we describe heuristics that improve the state-of-the-art practical algorithms used in graph-partitioning software in terms of both partitioning speed and quality. An important use of graph- ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
Graph partitioning is a fundamental problem in several scientific and engineering applications. In this paper, we describe heuristics that improve the state-of-the-art practical algorithms used in graph-partitioning software in terms of both partitioning speed and quality. An important use of graph-partitioning is in ordering sparse matrices for obtaining direct solutions to sparse systems of linear equations arising in engineering and optimization applications. The experiments reported in this paper show that the use of these heuristics results in a considerable improvement in the quality of sparse-matrix orderings over conventional ordering methods, especially for sparse matrices arising in linear programming problems. In addition, our graph-partitioning-based ordering algorithm is more parallelizable than minimum-degree-based ordering algorithms, and it renders the ordered matrix more amenable to parallel factorization.
Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods
- BIT
, 2001
"... Most state-of-the-art ordering schemes for sparse matrices are a hybrid of a bottom-up method such as minimum degree and a top down scheme such as George's nested dissection. In this paper we present an ordering algorithm that achieves a tighter coupling of bottom-up and topdown methods. In our meth ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Most state-of-the-art ordering schemes for sparse matrices are a hybrid of a bottom-up method such as minimum degree and a top down scheme such as George's nested dissection. In this paper we present an ordering algorithm that achieves a tighter coupling of bottom-up and topdown methods. In our methodology vertex separators are interpreted as the boundaries of the remaining elements in an unfinished bottom-up ordering. As a consequence, we are using bottomup techniques such as quotient graphs and special node selection strategies for the construction of vertex separators. Once all separators have been found, we are using them as a skeleton for the computation of several bottom-up orderings. Experimental results show that the orderings obtained by our scheme are in general better than those obtained by other popular ordering codes.
WSMP: Watson Sparse Matrix Package Part II - direct . . .
, 2000
"... This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties)
Applications of the Dulmage-Mendelsohn Decomposition and Network Flow to Graph Bisection Improvement
- SIAM J. Matrix Anal. Appl
, 1998
"... In this paper, we consider the use of the Dulmage-Mendelsohn decomposition and network flow on bipartite graphs to improve a graph bisection partition. Given a graph partition [S; B; W ] with a vertex separator S and two disconnected components B and W , different strategies are considered based on ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we consider the use of the Dulmage-Mendelsohn decomposition and network flow on bipartite graphs to improve a graph bisection partition. Given a graph partition [S; B; W ] with a vertex separator S and two disconnected components B and W , different strategies are considered based on the Dulmage-Mendelsohn decomposition to reduce the separator size jSj and/or the imbalance between B and W . For the case when the vertices are weighted, we relate this with the bipartite network flow problem. A further enhancement is made on partition improvement by generalizing the bipartite network to solving a general network flow problem. We demonstrate the utility of these improvement techniques on a set of sparse test matrices, where we find top level separators and nested dissection and multisection orderings. Key words. Dulmage-Mendelsohn decomposition, network flow, graph bisection, ordering algorithms, nested dissection. multisection. AMS(MOS) subject classifications. 65F05, 65...
pPCx: Parallel Software for Linear Programming
, 1997
"... We describe pPCx, a parallel variant of the PCx interior-point code for linear programming. We outline the major computational operation---parallel multifrontal Cholesky factorization---and present computational results on the IBM-SP multiprocessor. 1 Introduction PCx is a linear programming solve ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe pPCx, a parallel variant of the PCx interior-point code for linear programming. We outline the major computational operation---parallel multifrontal Cholesky factorization---and present computational results on the IBM-SP multiprocessor. 1 Introduction PCx is a linear programming solver developed at the Optimization Technology Center at Argonne National Laboratory and Northwestern University. It implements a variant of Mehrotra's predictor-corrector algorithm [5], a primal-dual interior-point approach that has proved to be highly efficient on large-scale linear programming problems. In this paper we describe pPCx, a parallel variant of PCx developed at Cornell University, and present computational results from the IBM-SP multiprocessor system. PCx typically requires between ten and one hundred iterations to find a primal-dual solution to a feasible linear program. At each iteration, two linear systems with an identical coefficient matrix must be solved. Most of the comput...
The User Manual for SPOOLES: Release 2.0: An Object Oriented Software Library for Solving Sparse Linear Systems of Equations
, 1998
"... Solving sparse linear systems of equations is a common and important component of a multitude of scientific and engineering applications. The SPOOLES software package 1 provides this functionality with a collection of software objects and methods. The library provides the user various options to a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Solving sparse linear systems of equations is a common and important component of a multitude of scientific and engineering applications. The SPOOLES software package 1 provides this functionality with a collection of software objects and methods. The library provides the user various options to assemble the sparse linear system and to order the system for sparsity preservation. The user can select numerical factorization options such as pivoting for numerical stability and a drop tolerance incomplete factorization. This package can be used for applications where linear systems of the form A+ oeB need to solved for various values of oe. A QR factorization capability for full rank over-determined systems is included. The library is written in ANSI C using object oriented design. Data is contained in objects. Each object has several methods to enter data into, extract data from, and to perform work on the data in the objects. This release of the library contains serial factorization a...

