Results 1 
4 of
4
Elimination Forest Guided 2D Sparse LU Factorization
"... Sparse LU factorization with partial pivoting is important for many scientific applications and delivering high performance for this problem is difficult on distributed memory machines. Our previous work has developed an approach called S* that incorporates static symbolic factorization, supernode p ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Sparse LU factorization with partial pivoting is important for many scientific applications and delivering high performance for this problem is difficult on distributed memory machines. Our previous work has developed an approach called S* that incorporates static symbolic factorization, supernode partitioning and graph scheduling. This paper studies the properties of elimination forests and uses them to guide supernode partitioning/amalgamation and execution scheduling. The new design with 2D mapping e ectively identifies dense structures without introducing too many zeros in the BLAS computation and exploits asynchronous parallelism with low buffer space cost. The implementation of this code, called S+, uses supernodal matrix multiplication which retains the BLAS3 level efficiency and avoids unnecessary arithmetic operations. The experiments show that S+ improves our previous code substantially and can achieve up to 11.04GFLOPS on 128 Cray T3E 450MHz nodes, which is the highest performance reported in the literature.
S+: Efficient 2D sparse LU factorization on parallel machines
 SIAM J. Matrix Anal. Appl
, 2001
"... Abstract. Static symbolic factorization coupled with supernode partitioning and asynchronous computation scheduling can achieve high gigaflop rates for parallel sparse LU factorization with partial pivoting. This paper studies properties of elimination forests and uses them to optimize supernode par ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Static symbolic factorization coupled with supernode partitioning and asynchronous computation scheduling can achieve high gigaflop rates for parallel sparse LU factorization with partial pivoting. This paper studies properties of elimination forests and uses them to optimize supernode partitioning/amalgamation and execution scheduling. It also proposes supernodal matrix multiplication to speed up kernel computation by retaining the BLAS3 level efficiency and avoiding unnecessary arithmetic operations. The experiments show that our new design with proper space optimization, called S +, improves our previous solution substantially and can achieve up to 10 GFLOPS on 128 Cray T3E 450MHz nodes. Key words. Gaussian elimination with partial pivoting, LU factorization, sparse matrices, elimination forests, supernode amalgamation and partitioning, asynchronous computation scheduling AMS subject classifications. 65F50, 65F05 PII. S0895479898337385
Efficient Sparse LU Factorization with Lazy Space Allocation
 IN PROCEEDINGS OF THE NINTH SIAM CONFERENCE ON PARALLEL PROCESSING FOR SCIENTI C COMPUTING
, 1999
"... Static symbolic factorization coupled with 2D supernode partitioning and asynchronous computation scheduling is a viable approach for sparse LU with dynamic partial pivoting. Our previous implementation, called S +, uses those techniques and achieves high giga op rates on distributed memory machines ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Static symbolic factorization coupled with 2D supernode partitioning and asynchronous computation scheduling is a viable approach for sparse LU with dynamic partial pivoting. Our previous implementation, called S +, uses those techniques and achieves high giga op rates on distributed memory machines. This paper studies the space requirement of this approach and proposes an optimization strategy called lazy space allocation which acquires memory onthefly only when it is necessary. This strategy can effectively control memory usage, especially when static symbolic factorization overestimates fillins excessively. Our experiments show that the improved S + code, which combines this strategy with eliminationforest guided partitioning and scheduling, has sequential time and space cost competitive with SuperLU, is space scalable for solving problems of large sizes on multiple processors, and can deliver up to 10 GFLOPS on 128 Cray 450Mhz T3E nodes.
Efficient Sparse Gaussian Elimination with Lazy Space Allocation
, 1999
"... A parallel algorithm is implemented for sparse Gaussian elimination on distributed memory machines. At First, we utilize the minimum degree ordering algorithm and transversal algorithm to reorder the columns and rows of the matrix. Next, we implement the LU factorization of the reordered matrix by ..."
Abstract
 Add to MetaCart
(Show Context)
A parallel algorithm is implemented for sparse Gaussian elimination on distributed memory machines. At First, we utilize the minimum degree ordering algorithm and transversal algorithm to reorder the columns and rows of the matrix. Next, we implement the LU factorization of the reordered matrix by combining various techniques, such as static symbolic factorization, 2D supernode partitioning, asynchronous computation scheduling and the new lazy space allocation strategy. This lazy space allocation strategy can effectively control memory usage, especially when static symbolic factorization overestimates fillins excessively. Our experiments show that the new LU code using this strategy has sequential time and space cost competitive with SuperLU, and can deliver up to 10 GFLOPS when running on 128 Cray ...