Results 1 - 10
of
65
Highly scalable parallel algorithms for sparse matrix factorization
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract
-
Cited by 100 (29 self)
- Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge,
Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree
, 1995
"... Various parameters of graphs connected to sparse matrix factorization and other applications can be approximated using an algorithm of Leighton et al. that finds vertex separators of graphs. The approximate values of the parameters, which include minimum front size, treewidth, pathwidth, and minimum ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
Various parameters of graphs connected to sparse matrix factorization and other applications can be approximated using an algorithm of Leighton et al. that finds vertex separators of graphs. The approximate values of the parameters, which include minimum front size, treewidth, pathwidth, and minimum elimination tree height, are no more than O(logn) (minimum front size and treewidth) and O(log^2 n) (pathwidth and minimum elimination tree height) times the optimal values. In addition, we show that unless P = NP there are no absolute approximation algorithms for any of the parameters.
Hybrid scheduling for the parallel solution of linear systems
- Parallel Computing
, 2006
"... In this paper, we consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static opti ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
In this paper, we consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static optimistic scenario during the analysis phase. This scenario is then used during the factorization phase to constrain the dynamic decisions. The task scheduler has been redesigned to take into account these new features. Moreover performance have been improved because the new constraints allow the new scheduler to make optimal decisions that were forbidden or too dangerous in unconstrained formulations. Performance analysis show that the memory estimation becomes much closer to the memory effectively used and that even in a constrained memory environment we decrease the factorization time with respect to the initial approach.
Highly Parallel Sparse Cholesky Factorization
- SIAM Journal on Scientific and Statistical Computing
, 1992
"... We develop and compare several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. Our experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model conceptually supplies one processor per data element. In ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
We develop and compare several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. Our experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special-purpose algorithms in which the matrix structure conforms to the connection structure of the machine, our focus is on matrices with arbitrary sparsity structure.
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method
- ACM Transactions on Mathematical Software
, 2004
"... A new method for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fill-in and then refined during numerical factoriza ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
A new method for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fill-in and then refined during numerical factorization (while preserving the bound). Pivot rows are selected to maintain numerical stability and to preserve sparsity. The method analyzes the matrix and automatically selects one of three pre-ordering and pivoting strategies. The number of nonzeros in the LU factors computed by the method is typically less than or equal to those found by a wide range of unsymmetric sparse LU factorization methods, including left-looking methods and prior multifrontal methods.
Recent Advances in Direct Methods for Solving Unsymmetric Sparse Systems of Linear Equations
, 2001
"... This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). IBM Research Division Almaden \Delta Austin \Delta China \Delta Delhi \Delta Haifa \Delta Tokyo \Delta Watson \Delta Zurich Recent Advances in Direct Methods for Solving Unsymmetric Sparse Systems of Linear Equations Anshul Gupta IBM T.J. Watson Research Center During the past few years, algorithmic improve
Performance of a Fully Parallel Sparse Solver
- Int. Journal of Supercomputer Applications
, 1996
"... The performance of a fully parallel direct solver for large sparse symmetric positive definite systems of linear equations is demonstrated. The solver is designed for distributed-memory, message-passing parallel computer systems. All phases of the computation, including symbolic processing as well a ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The performance of a fully parallel direct solver for large sparse symmetric positive definite systems of linear equations is demonstrated. The solver is designed for distributed-memory, message-passing parallel computer systems. All phases of the computation, including symbolic processing as well as numeric factorization and triangular solution, are performed in parallel. A parallel Cartesian nested dissection algorithm is used to compute a fill-reducing ordering for the matrix and an appropriate partitioning of the problem across the processors. The separator This research was supported by the Advanced Research Projects Agency through the Army Research Office under contract number DAAL03-91-C-0047. y Department of Computer Science and NCSA, University of Illinois, 1304 West Springfield Ave., Urbana, IL 61801, e-mail: heath@cs.uiuc.edu. z Department of Computer Science, University of Tennessee, 107 Ayres Hall, Knoxville, TN 37996, e-mail: padma@cs.utk.edu. Parallel Sparse Sol...
A Parallel Formulation of Interior Point Algorithms
- DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF MINNESOTA
, 1994
"... In recent years, interior point algorithms have been used successfully for solving medium to large-size linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a s ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
In recent years, interior point algorithms have been used successfully for solving medium to large-size linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a sparse system of linear equations using Cholesky factorization. The performance of parallel Cholesky factorization is determined by (a) the communication overhead incurred by the algorithm, and (b) the load imbalance among the processors. In our parallel interior point algorithm, we use our recently developed parallel multifrontal algorithm that has the smallest communication overhead over all parallel algorithms for Cholesky factorization developed to date. The computation imbalance depends on the shape of the elimination tree associated with the sparse system reordered for factorization. To balance the computation, we implemented and evaluated four di#erent ordering algorithms. Among these algorithms, Kernighan-Lin and spectral nested dissection yield the most balanced elimination trees and greatly increase the amount of parallelism that can be exploited. Our preliminary implementation achieves a speedup as high as 108 on 256-processor nCUBE 2 on moderate-size problems.
WSMP: Watson Sparse Matrix Package Part II - direct . . .
, 2000
"... This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties)
An Unsymmetrized Multifrontal LU Factorization
- SIAM Journal on Matrix Analysis and Applications
, 2000
"... A well known approach to compute the LU factorization of a general unsymmetric matrix A is to build the elimination tree associated with the pattern of the symmetric matrix A+A T and use it as a computational graph to drive the numerical factorization. This approach, although very efficient on a lar ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
A well known approach to compute the LU factorization of a general unsymmetric matrix A is to build the elimination tree associated with the pattern of the symmetric matrix A+A T and use it as a computational graph to drive the numerical factorization. This approach, although very efficient on a large range of unsymmetric matrices, does not capture the unsymmetric structure of the matrices. We introduce a new algorithm which detects and exploits the structural asymmetry of the submatrices involved during the processing of the elimination tree. We show that, with the new algorithm, significant gains both in memory and in time to perform the factorization can be obtained.

