Results 1  10
of
45
SuperLU DIST: A scalable distributedmemory sparse direct solver for unsymmetric linear systems
 ACM Trans. Mathematical Software
, 2003
"... We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and sc ..."
Abstract

Cited by 138 (19 self)
 Add to MetaCart
(Show Context)
We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on largescale distributed machines.
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 128 (29 self)
 Add to MetaCart
(Show Context)
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering
, 1996
"... ..."
(Show Context)
Sparse Gaussian Elimination on High Performance Computers
, 1996
"... This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performan ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performance machines. In the first part we discuss optimizations of a sequential algorithm to exploit the memory hierarchies that exist in most RISCbased superscalar computers. We begin with the leftlooking supernodecolumn algorithm by Eisenstat, Gilbert and Liu, which includes Eisenstat and Liu's symmetric structural reduction for fast symbolic factorization. Our key contribution is to develop both numeric and symbolic schemes to perform supernodepanel updates to achieve better data reuse in cache and floatingpoint register...
A CoarseGrain Parallel Formulation of Multilevel kway Graph Partitioning Algorithm
 PARALLEL PROCESSING FOR SCIENTIFIC COMPUTING. SIAM
, 1997
"... In this paper we present a parallel formulation of a multilevel kway graph partitioning algorithm, that is particularly suited for messagepassing libraries that have high latency. The multilevel kway partitioning algorithm reduces the size of the graph by successively collapsing vertices and edge ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
In this paper we present a parallel formulation of a multilevel kway graph partitioning algorithm, that is particularly suited for messagepassing libraries that have high latency. The multilevel kway partitioning algorithm reduces the size of the graph by successively collapsing vertices and edges (coarsening phase), finds a kway partitioning of the smaller graph, and then it constructs a kway partitioning for the original graph by projecting and refining the partition to successively finer graphs (uncoarsening phase). Our algorithm is able to achieve a high degree of concurrency, while maintaining the high quality partitions produced by the serial algorithm.
Performance of a fully parallel sparse solver
 Int. J. Supercomputing Appl
, 1997
"... ..."
(Show Context)
PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions
 In Proceedings of Irregular'2000, Cancun, Mexique, number 1800 in Lecture Notes in Computer Science
, 2000
"... We present and analyze a general algorithm which computes an ecient static scheduling of block computations for a parallel L:D:L t factorization of sparse symmetric positive denite systems based on a combination of 1D and 2D block distributions. Our solver uses a supernodal fanin approach and ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
We present and analyze a general algorithm which computes an ecient static scheduling of block computations for a parallel L:D:L t factorization of sparse symmetric positive denite systems based on a combination of 1D and 2D block distributions. Our solver uses a supernodal fanin approach and is fully driven by this scheduling. We give an overview of the algorithm and present performance results and comparisons with PSPASES on an IBMSP2 with 120 MHz Power2SC nodes for a collection of irregular problems. This work is supported by the Commissariat a l' Energie Atomique CEA/CESTA under contract No. 7V1555AC, and by the GDR ARP (iHPerf group) of the CNRS. 1 1 Introduction Solving large sparse symmetric positive denite systems Ax = b of linear equations is a crucial and timeconsuming step, arising in many scientic and engineering applications. Consequently, many parallel formulations for sparse matrix factorization have been studied and implemented; one can refer t...
A high performance sparse Cholesky factorization algorithm for scalable parallel computers
 Department of Computer Science, University of Minnesota
, 1994
"... Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtreetosubcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 6GFlops on a 256processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.
A Mapping and Scheduling Algorithm for Parallel Sparse FanIn Numerical Factorization
 In EuroPar'99 Parallel Processing, Lecture Notes in Computer Science
, 2000
"... We present and analyze a general algorithm which computes ecient static schedulings of block computations for parallel sparse linear factorization. Our solver, based on a supernodal fanin approach, is fully driven by this scheduling. We give an overview of the algorithms and present performance ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
We present and analyze a general algorithm which computes ecient static schedulings of block computations for parallel sparse linear factorization. Our solver, based on a supernodal fanin approach, is fully driven by this scheduling. We give an overview of the algorithms and present performance results on a 16node IBMSP2 with 66 MHz Power2 thin nodes for a collection of grid and irregular problems. This work is supported by the Commissariat a l' Energie Atomique CEA/CESTA under contract No. 7V1555AC, and by the GDR ARP (iHPerf group) of the CNRS. 1 1 Introduction Solving large sparse symmetric positive denite systems Ax = b of linear equations is a crucial and timeconsuming step, arising in many scientic and engineering applications. Consequently, many parallel formulations for sparse matrix factorization have been studied and implemented; one can refer to [6] for a complete survey on high performance sparse factorization. In this paper, we focus on the block par...