Results 1  10
of
91
Hybrid scheduling for the parallel solution of linear systems
, 2004
"... apport de rechercheHybrid scheduling for the parallel solution of linear systems ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
apport de rechercheHybrid scheduling for the parallel solution of linear systems
SPOOLES: An ObjectOriented Sparse Matrix Library
 In Proceedings of the 9th SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... ction and multisection. The latter two orderings depend on a domain/separator tree that is constructed using a graph partitioning method. Domain decomposition is used to find an initial separator, and a sequence of network flow problems are solved to smooth the separator. The qualities of our nested ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
ction and multisection. The latter two orderings depend on a domain/separator tree that is constructed using a graph partitioning method. Domain decomposition is used to find an initial separator, and a sequence of network flow problems are solved to smooth the separator. The qualities of our nested dissection and multisection orderings are comparable to other state of the art packages. Factorizations of square matrices have the form A = PLDUQ and A = PLDL T P T , where P and Q are permutation matrices. Square systems of the form A + #B may also be factored and solved (as found in shiftandinvert eigensolvers), as well as full rank overdetermined linear systems, where a QR factorization is computed and the solution found by solving the seminormal equations. # This research was supported in part by the
Making Sparse Gaussian Elimination Scalable by Static Pivoting
 In Proceedings of Supercomputing
, 1998
"... We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is th ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern for Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we design highly parallel algorithms for both sparse Gaussian elimination and triangular solve and we show that they are suitable for largescale distributed memory machines. Keywords: sparse unsymmetric linear systems, static pivoting, iterative refinement, MPI, 2D matrix decomposition. 1 Introduction In our earlier work [8, 9, 22], we developed new algorithms to solve unsymmetric sparse linear systems using Gaussian elimination with partial pivoting (GEPP). The new algorithms are hi...
Hybridizing Nested Dissection and Halo Approximate Minimum Degree for Efficient Sparse Matrix Ordering
 IN PROCEEDINGS OF IRREGULAR'99, LNCS 1586
, 1999
"... Minimum degree and nested dissection are the two most popular reordering schemes used to reduce llin and operation count when factoring and solving sparse matrices. Most of the stateoftheart ordering packages hybridize these methods by performing incomplete nested dissection and ordering by ..."
Abstract

Cited by 31 (16 self)
 Add to MetaCart
Minimum degree and nested dissection are the two most popular reordering schemes used to reduce llin and operation count when factoring and solving sparse matrices. Most of the stateoftheart ordering packages hybridize these methods by performing incomplete nested dissection and ordering by minimum degree the subgraphs associated with the leaves of the separation tree, but most often only loose couplings have been achieved, resulting in poorer performance than could have been expected. This paper presents a tight coupling of the nested dissection and halo approximate minimum degree algorithms, which allows the minimum degree algorithm to use exact degrees on the boundaries of the subgraphs passed to it, and to yield back not only the ordering of the nodes of the subgraph, but also the amalgamated assembly subtrees, for efficient block computations. Experimental results show the performance improvement of this hybridization, both in terms of fillin reduction and increa...
Recent Advances in Direct Methods for Solving Unsymmetric Sparse Systems of Linear Equations
, 2001
"... ..."
A numerical evaluation of sparse direct solvers for the solution of large sparse, symmetric linear systems of equations
, 2005
"... ..."
The design and implementation of a new outofcore sparse Cholesky factorization method
 ACM Transactions on Mathematical Software
"... We describe a new outofcore sparse Cholesky factorization method. The new method uses the elimination tree to partition the matrix, an advanced subtreescheduling algorithm, and both rightlooking and leftlooking updates. The implementation of the new method is efficient and robust. On a 2 GHz per ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We describe a new outofcore sparse Cholesky factorization method. The new method uses the elimination tree to partition the matrix, an advanced subtreescheduling algorithm, and both rightlooking and leftlooking updates. The implementation of the new method is efficient and robust. On a 2 GHz personal computer with 768 MB of main memory, the code can easily factor matrices with factors of up to 48 GB, usually at rates above 1 Gflop/s. For example, the code can factor AUDIKW, currenly the largest matrix in any matrix collection (factor size over 10 GB), in a little over an hour, and can factor a matrix whose graph is a 140by140by140 mesh in about 12 hours (factor size around 27 GB).
PTScotch: A tool for efficient parallel graph ordering
"... The parallel ordering of large graphs is a difficult problem, because neither minimumdegree algorithms, nor the best graph partitioning methods that are necessary to nested dissection, parallelize or scale well. This paper presents a set of algorithms, implemented in the PTScotch software package, ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The parallel ordering of large graphs is a difficult problem, because neither minimumdegree algorithms, nor the best graph partitioning methods that are necessary to nested dissection, parallelize or scale well. This paper presents a set of algorithms, implemented in the PTScotch software package, which allows one to order large graphs in parallel, yielding orderings the quality of which is equivalent to the one of stateoftheart sequential algorithms.
Analysis and comparison of two general sparse solvers for distributed memory computers
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 2001
"... This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algo ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algorithmic features of the two solvers and compare their performance characteristics with respect to uniprocessor speed, interprocessor communication, and memory requirements. For both solvers, preorderings for numerical stability and sparsity play an important role in achieving high parallel efficiency. We analyse the results with various ordering algorithms. Our performance analysis is based on data obtained from runs on a 512processor Cray T3E using a set of matrices from real applications. We also use regular 3D grid problems to study the scalability of the two solvers.