Results 1  10
of
43
Optimizing the performance of sparse matrixvector multiplication
, 2000
"... Copyright 2000 by EunJin Im ..."
(Show Context)
Making Sparse Gaussian Elimination Scalable by Static Pivoting
 In Proceedings of Supercomputing
, 1998
"... We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is th ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
(Show Context)
We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern for Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we design highly parallel algorithms for both sparse Gaussian elimination and triangular solve and we show that they are suitable for largescale distributed memory machines. Keywords: sparse unsymmetric linear systems, static pivoting, iterative refinement, MPI, 2D matrix decomposition. 1 Introduction In our earlier work [8, 9, 22], we developed new algorithms to solve unsymmetric sparse linear systems using Gaussian elimination with partial pivoting (GEPP). The new algorithms are hi...
A scalable sparse direct solver using static pivoting
 In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern, which makes it more scalable. We demonstrate the scalability of our algorithms on largescale distributed memory computers. 1
NestedDissection Orderings For Sparse Lu With Partial Pivoting
 SIAM J. Matrix Anal. Appl
, 2000
"... . We describe the implementation and performance of a novel fillminimization ordering technique for sparse LU factorization with partial pivoting. The technique was proposed by Gilbert and Schreiber in 1980 but never implemented and tested. Like other techniques for ordering sparse matrices for ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
. We describe the implementation and performance of a novel fillminimization ordering technique for sparse LU factorization with partial pivoting. The technique was proposed by Gilbert and Schreiber in 1980 but never implemented and tested. Like other techniques for ordering sparse matrices for LU with partial pivoting, our new method preorders the columns of the matrix (the row permutation is chosen by the pivoting sequence during the numerical factorization). Also like other methods, the column permutation Q that we select is a permutation that minimizes the fill in the Cholesky factor of Q T A T AQ. Unlike existing columnordering techniques, which all rely on minimumdegree heuristics, our new method is based on a nesteddissection ordering of A T A. Our algorithm, however, never computes a representation of A T A, which can be expensive. We only work with a representation of A itself. Our experiments demonstrate that the method is e#cient and that it can reduce fill significantly relative to the best existing methods. The method reduces the LU running time on some very large matrices (tens of millions of nonzeros in the factors) by more than a factor of 2. 1.
Using mixed precision for sparse matrix computations to enhance the performance while achieving 64bit accuracy
 ACM Trans. Math. Softw
"... By using a combination of 32bit and 64bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techni ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
By using a combination of 32bit and 64bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as
Computing Row and Column Counts for Sparse QR and LU Factorization
 BIT
, 2001
"... We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in th ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and there is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an ecient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose. ...
Elimination Forest Guided 2D Sparse LU Factorization
"... Sparse LU factorization with partial pivoting is important for many scientific applications and delivering high performance for this problem is difficult on distributed memory machines. Our previous work has developed an approach called S* that incorporates static symbolic factorization, supernode p ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Sparse LU factorization with partial pivoting is important for many scientific applications and delivering high performance for this problem is difficult on distributed memory machines. Our previous work has developed an approach called S* that incorporates static symbolic factorization, supernode partitioning and graph scheduling. This paper studies the properties of elimination forests and uses them to guide supernode partitioning/amalgamation and execution scheduling. The new design with 2D mapping e ectively identifies dense structures without introducing too many zeros in the BLAS computation and exploits asynchronous parallelism with low buffer space cost. The implementation of this code, called S+, uses supernodal matrix multiplication which retains the BLAS3 level efficiency and avoids unnecessary arithmetic operations. The experiments show that S+ improves our previous code substantially and can achieve up to 11.04GFLOPS on 128 Cray T3E 450MHz nodes, which is the highest performance reported in the literature.
An ESchedulerBased Data Dependence Analysis and Task Scheduling for Parallel Circuit Simulation
 TCASII
, 2011
"... Abstract—The sparse matrix solver has become the bottleneck ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
(Show Context)
Abstract—The sparse matrix solver has become the bottleneck