## Solving unsymmetric sparse systems of linear equations with PARDISO (2004)

Venue: | Journal of Future Generation Computer Systems |

Citations: | 96 - 8 self |

### BibTeX

@ARTICLE{Gärtner04solvingunsymmetric,

author = {Klaus Gärtner},

title = {Solving unsymmetric sparse systems of linear equations with PARDISO},

journal = {Journal of Future Generation Computer Systems},

year = {2004},

volume = {20},

pages = {475--487}

}

### Years of Citing Articles

### OpenURL

### Abstract

Supernode partitioning for unsymmetric matrices together with complete block diagonal supernode pivoting and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and unsymmetric prepermutation of rows is used to place large matrix entries on the diagonal. Complete block diagonal supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The level-3 BLAS efficiency is retained and an advanced two-level left–right looking scheduling scheme results in good speedup on SMP machines. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications. Key words: Computational sciences, numerical linear algebra, direct solver, unsymmetric linear systems

### Citations

797 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
(Show Context)
Citation Context ...computed based on the structure of §©¨ § ¨ � , e.g. minimum degree or nested dissection. Alls����s��� experiments reported in this paper with PARDISO were conducted with a nested dissection algorithm =-=[16]-=-. 3sLike other modern sparse factorization codes [4,7,10,11,14,20], PARDISO relies heavily on supernodes to efficiently utilize the memory hierarchies in the hardware. There are two main approaches in... |

255 | An approximate minimum degree ordering algorithm
- Amestoy, Davis, et al.
- 1996
(Show Context)
Citation Context ... computed on the structure ofs�s�� . SuperLU ¢¡¤£¦¥ uses the multiple minimum degree [18], WSMP and PARDISO use a nested dissection ordering [13,16], and MUMPS an approximate minimum degree algorithm =-=[3]-=-. UMFPACK uses a column approximate minimum degree algorithm to compute a fill-in reducing preordering [8]. The third difference is that WSMP is the only solver that reduces the coefficient matrix int... |

192 | A supernodal approach to sparse partial pivoting
- Demmel, Eisenstat, et al.
- 1999
(Show Context)
Citation Context ...inimum degree or nested dissection. Alls����s��� experiments reported in this paper with PARDISO were conducted with a nested dissection algorithm [16]. 3sLike other modern sparse factorization codes =-=[4,7,10,11,14,20]-=-, PARDISO relies heavily on supernodes to efficiently utilize the memory hierarchies in the hardware. There are two main approaches in building these supernode. In the first approach, consecutive rows... |

150 | A fully asynchronous multifrontal solver using distributed dynamic scheduling
- Amestoy, Duff, et al.
- 2001
(Show Context)
Citation Context ...plications and by numerical performance comparisons with some prominent software packages for solving general sparse systems. The paper compares the serial performance 2sof SuperLU ¢¡¤£¦¥ [17], MUMPS =-=[4]-=-, UMFPACK 3 [8] and WSMP [14] with PARDISO. This paper also contains a parallel performance comparison of PARDISO and WSMP - the general purpose sparse solver that has been shown in [14] to be the bes... |

139 |
Modification of the minimum-degree algorithm by multiple elimination
- Liu
- 1985
(Show Context)
Citation Context ... does not use it at all. Secondly, by default, SuperLU ¢¡¤£ ¥ , WSMP, MUMPS and PARDISO use a symmetric permutation computed on the structure ofs�s�� . SuperLU ¢¡¤£¦¥ uses the multiple minimum degree =-=[18]-=-, WSMP and PARDISO use a nested dissection ordering [13,16], and MUMPS an approximate minimum degree algorithm [3]. UMFPACK uses a column approximate minimum degree algorithm to compute a fill-in redu... |

117 | An unsymmetric-pattern multifrontal method for sparse LU factorization
- Davis, Duff
- 1994
(Show Context)
Citation Context ...inimum degree or nested dissection. Alls����s��� experiments reported in this paper with PARDISO were conducted with a nested dissection algorithm [16]. 3sLike other modern sparse factorization codes =-=[4,7,10,11,14,20]-=-, PARDISO relies heavily on supernodes to efficiently utilize the memory hierarchies in the hardware. There are two main approaches in building these supernode. In the first approach, consecutive rows... |

84 |
Du , Solving sparse linear systems with sparse backward error
- Arioli, Demmel, et al.
- 1989
(Show Context)
Citation Context ...stem had the known solution with . Runs were also performed with other choices of , with similar results. The iterative refinement was stopped when the componentwise relative backward error, Berr = , =-=[5]-=- was close to machine precision or when Berr does not converge at least by a factor of 2 during one iteration. Table 2 shows the number of steps of iterative refinement or conjugate gradient square it... |

84 |
B.W.: Block sparse Cholesky algorithms on advanced uniprocessor computers
- Ng, Peyton
- 1993
(Show Context)
Citation Context ...inimum degree or nested dissection. Alls����s��� experiments reported in this paper with PARDISO were conducted with a nested dissection algorithm [16]. 3sLike other modern sparse factorization codes =-=[4,7,10,11,14,20]-=-, PARDISO relies heavily on supernodes to efficiently utilize the memory hierarchies in the hardware. There are two main approaches in building these supernode. In the first approach, consecutive rows... |

71 | The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
- Duff, Koster
- 1999
(Show Context)
Citation Context ...ving general sparse linear systems, developing high performance parallel software is challenging because partial pivoting causes the computational task-dependency graph to change during execution. In =-=[12]-=- new permutations and scaling strategies are introduced for sparse Gaussian elimination. The goal is to preprocess the coefficient matrixsso as to obtain an equivalent system with a matrix that is bet... |

68 |
a fast Lanczos-type solver for nonsymmetric linear systems
- Sonnenveld, “CGS
- 1989
(Show Context)
Citation Context ... failures, they corrupt only a low dimensional subspace and each perturbation is a �����¨����� update ofs�� , so iterative refinement or a Krylov space method such as conjugate gradient squared (CGS) =-=[23]-=- with the perturbed factorssand ¡ can compensate for such corruption with only a few extra iterations. In the numerical experiments in section 3, a CGS algorithm is used in the cases where iterative r... |

63 | An asynchronous parallel supernodal algorithm for sparse Gaussian elimination
- Demmel, Gilbert, et al.
- 1999
(Show Context)
Citation Context |

48 |
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
- Agarwal, Gustavson, et al.
- 1994
(Show Context)
Citation Context ...e factorization process. The TLB contains a finite set of pages which are known as the current working set of the computation. If the computation addresses only memory in the TLB, there is no penalty =-=[1]-=-. Otherwise, a TLB miss occurs, resulting in a large performance penalty. In the one-level algorithm e.g. two leaf supernodes that are direct siblings can be factorized by two different processors. Th... |

39 |
The influence of relaxed supernode partitions on the multifrontal method
- Ashcraft, Grimes
(Show Context)
Citation Context ...ated as one supernode. These supernodes are so crucial to high performance in sparse matrix factorization that the criterion for the inclusion of rows and columns in the same supernode can be relaxed =-=[6]-=- to increase the size of the supernodes. This is the second approach and it is called supernode amalgation. In this approach consecutive rows and columns with nearly the same but not identical structu... |

31 | Recent Advances in Direct Methods for Solving Unsymmetric Sparse Systems of Linear Equations
- Gupta
- 2002
(Show Context)
Citation Context ...performance comparisons with some prominent software packages for solving general sparse systems. The paper compares the serial performance 2sof SuperLU ¢¡¤£¦¥ [17], MUMPS [4], UMFPACK 3 [8] and WSMP =-=[14]-=- with PARDISO. This paper also contains a parallel performance comparison of PARDISO and WSMP - the general purpose sparse solver that has been shown in [14] to be the best at the time of PARDISO’s re... |

26 |
Efficient sparse LU factorization with left-right looking strategy on shared memory multiprocessors
- Schenk, Gärtner, et al.
(Show Context)
Citation Context ...dependency graph. While it is not claimed that this alternative approach to partial pivoting or static pivoting will always work, it is shown that the methods implemented in the PARDISO direct solver =-=[21,22]-=- can be successful. The evidence is provided by numerical experiments for a large set of general sparse matrices from real applications and by numerical performance comparisons with some prominent sof... |

25 |
A scalable sparse direct solver using static pivoting
- LI, DEMMEL
- 1999
(Show Context)
Citation Context ...ng reduces the need for partial pivoting, thereby speeding up the factorization process. Evidence of the usefulness of this preprocessing in connection with sparse direct solvers has been provided in =-=[2,17]-=-. However, the amount of partial pivoting after this unsymmetric preprocessing can be still a limitation for the parallel factorization step and static pivoting with iterative refinement, as proposed ... |

20 | Analysis and comparison of two general sparse solvers for distributed memory comoputers
- Amestoy, Duff, et al.
- 2001
(Show Context)
Citation Context ...ng reduces the need for partial pivoting, thereby speeding up the factorization process. Evidence of the usefulness of this preprocessing in connection with sparse direct solvers has been provided in =-=[2,17]-=-. However, the amount of partial pivoting after this unsymmetric preprocessing can be still a limitation for the parallel factorization step and static pivoting with iterative refinement, as proposed ... |

8 |
On algorithms for finding maximum matchings in bipartite graphs
- Gupta, Ying
- 1999
(Show Context)
Citation Context ...ximize the product of the magnitudes of the diagonal entries for all matrices. MUMPS uses it only if the structural symmetry in the original matrix is less than 50%, WSMP uses a similar preprocessing =-=[15]-=- only on matrices if the structural symmetry is less than 80%, and UMFPACK 3 does not use it at all. Secondly, by default, SuperLU ¢¡¤£ ¥ , WSMP, MUMPS and PARDISO use a symmetric permutation computed... |

3 |
UMFPACK V3.2: an unsymmetric-pattern multifrontal method with a column pre-ordering strategy
- Davis
- 2002
(Show Context)
Citation Context ...by numerical performance comparisons with some prominent software packages for solving general sparse systems. The paper compares the serial performance 2sof SuperLU ¢¡¤£¦¥ [17], MUMPS [4], UMFPACK 3 =-=[8]-=- and WSMP [14] with PARDISO. This paper also contains a parallel performance comparison of PARDISO and WSMP - the general purpose sparse solver that has been shown in [14] to be the best at the time o... |

3 |
Two-level scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems
- Schenk, Gärtner
(Show Context)
Citation Context ...dependency graph. While it is not claimed that this alternative approach to partial pivoting or static pivoting will always work, it is shown that the methods implemented in the PARDISO direct solver =-=[21,22]-=- can be successful. The evidence is provided by numerical experiments for a large set of general sparse matrices from real applications and by numerical performance comparisons with some prominent sof... |

2 |
National Institute of Standards and Technology
- Market
(Show Context)
Citation Context ...he numerical experiments are described. The names and the sizes of the unsymmetric test matrices that are used in the experiments are shown in Table 1. Most of these matrices are in the public domain =-=[9,19]-=- and freely available. Further, all matrices were generated by real world applications. The table also contains the dimension, the number of nonzeros, and the related application area. These matrices ... |

1 |
Available online from http://www.cise.ufl.edu/ ˜ davis/sparse
- Davis
(Show Context)
Citation Context ...he numerical experiments are described. The names and the sizes of the unsymmetric test matrices that are used in the experiments are shown in Table 1. Most of these matrices are in the public domain =-=[9,19]-=- and freely available. Further, all matrices were generated by real world applications. The table also contains the dimension, the number of nonzeros, and the related application area. These matrices ... |

1 |
Fast and effective algorithms for solving graph partitioning and sparse matrix ordering
- Gupta
- 1997
(Show Context)
Citation Context ...¡¤£ ¥ , WSMP, MUMPS and PARDISO use a symmetric permutation computed on the structure ofs�s�� . SuperLU ¢¡¤£¦¥ uses the multiple minimum degree [18], WSMP and PARDISO use a nested dissection ordering =-=[13,16]-=-, and MUMPS an approximate minimum degree algorithm [3]. UMFPACK uses a column approximate minimum degree algorithm to compute a fill-in reducing preordering [8]. The third difference is that WSMP is ... |