## SuperLU Users' Guide (1999)

Citations: | 60 - 3 self |

### BibTeX

@MISC{Demmel99superluusers',

author = {James W. Demmel and John R. Gilbert and Xiaoye S. Li},

title = {SuperLU Users' Guide},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

This document describes a collection of three related ANSI C subroutine libraries for solving sparse linear systems of equations AX = B. Here A is a square, nonsingular, n \Theta n sparse matrix, and X and B are dense n \Theta nrhs matrices, where nrhs is the number of right-hand sides and solution vectors. Matrix A need not be symmetric or definite; indeed, SuperLU is particularly appropriate for matrices with very unsymmetric structure. All three libraries use variations of Gaussian elimination optimized to take advantage both of sparsity and the computer architecture, in particular memory hierarchies (caches) and parallelism. In this introduction we refer to all three libraries collectively as SuperLU. The three libraries within SuperLU are as follows. Detailed references are also given (see also [19]).

### Citations

929 |
Accuracy and Stability of Numerical Algorithms
- Higham
- 1996
(Show Context)
Citation Context ...Here �x�∞ ≡ maxi |xi|. Thus, if FERR = 10 −6 then each component of x has an error bounded by about 10 −6 times the largest component of x. The algorithm used to compute FERR is an approximation; see =-=[2, 18]-=- for a discussion. Generally FERR is accurate to within a factor of 10 or better, which is adequate to say how many digits of the large entries of x are correct. (SuperLU DIST’s algorithm for FERR is ... |

773 |
A set of level 3 basic linear algebra subprograms
- Dongarra, Croz, et al.
- 1990
(Show Context)
Citation Context ...iffers among the three libraries, as discussed in section 1.4.) 1.3.2 Tuning Parameters for BLAS All three libraries depend on having high performance BLAS (Basic Linear Algebra Subroutine) libraries =-=[22, 8, 7]-=- in order to get high performance. In particular, they depend on matrix-vector multiplication or matrix-matrix multiplication of relatively small dense matrices. The sizes of these small dense matrice... |

572 |
Basic Linear Algebra Subprograms for Fortran Usage
- LAWSON, HANSON, et al.
- 1979
(Show Context)
Citation Context ...iffers among the three libraries, as discussed in section 1.4.) 1.3.2 Tuning Parameters for BLAS All three libraries depend on having high performance BLAS (Basic Linear Algebra Subroutine) libraries =-=[22, 8, 7]-=- in order to get high performance. In particular, they depend on matrix-vector multiplication or matrix-matrix multiplication of relatively small dense matrices. The sizes of these small dense matrice... |

469 | An Extended Set of FORTRAN Basic Linear Algebra Subprograms
- DONGARRA, J, et al.
- 1988
(Show Context)
Citation Context ...iffers among the three libraries, as discussed in section 1.4.) 1.3.2 Tuning Parameters for BLAS All three libraries depend on having high performance BLAS (Basic Linear Algebra Subroutine) libraries =-=[22, 8, 7]-=- in order to get high performance. In particular, they depend on matrix-vector multiplication or matrix-matrix multiplication of relatively small dense matrices. The sizes of these small dense matrice... |

370 | ScaLAPACK Users’ Guide - Blackford, Choi, et al. - 1997 |

306 | A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, V4.0
- Karypsis, Kumar
- 1998
(Show Context)
Citation Context ...alue, the library will use the permutation vector perm c[] as an input, which may be obtained from any other ordering algorithm. For example, the nesteddissection type of ordering codes include Metis =-=[19]-=-, Chaco [16] and Scotch [30]. Alternatively, the users can provide their own column permutation vector. For example, it may be an ordering suitable for the underlying physical problem. Both driver rou... |

269 | A column approximate minimum degree ordering algorithm
- Davis, Gilbert, et al.
- 2000
(Show Context)
Citation Context ...rdering, • Multiple Minimum Degree (MMD) [27] applied to the structure of A T A, • Multiple Minimum Degree (MMD) [27] applied to the structure of A T + A, • Column Approximate Minimum Degree (COLAMD) =-=[4]-=-, and • Use a Pc supplied by the user as input. COLAMD is designed particularly for unsymmetric matrices when partial pivoting is needed, and does not require explicit formation of A T A. It usually g... |

237 | Users’ guide for the harwell-boeing sparse matrix collection (Release I
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...w indices, and the indices indicating the beginning of each column in the coefficient and row index arrays. This storage format is called compressed column format, also known as Harwell-Boeing format =-=[10]-=-. Next, the two utility routines dCreate CompCol Matrix() and dCreate Dense Matrix() are called to set up the matrix structures for A and B, respectively. The routine set default options() sets the de... |

205 | A supernodal approach to sparse partial pivoting
- Demmel, Eisenstat, et al.
- 1999
(Show Context)
Citation Context ...ries within SuperLU are as follows. Detailed references are also given (see also [23]). • Sequential SuperLU is designed for sequential processors with one or more layers of memory hierarchy (caches) =-=[5]-=-. • Multithreaded SuperLU (SuperLU MT) is designed for shared memory multiprocessors (SMPs), and can effectively use up to 16 or 32 parallel processors on sufficiently large matrices in order to speed... |

143 |
Modification of the minimum degree algorithm by multiple elimination
- Liu
- 1985
(Show Context)
Citation Context ...dering the Columns of A for Sparse Factors There is a choice of orderings for the columns of A both in the simple or expert driver, in section 1.2: • Natural ordering, • Multiple Minimum Degree (MMD) =-=[26]-=- applied to the structure of A T A, • Multiple Minimum Degree (MMD) [26] applied to the structure of A T + A, • Column Approximate Minimum Degree (COLAMD) [4], and • Use a Pc supplied by the user as i... |

129 |
ParMeTiS: Parallel graph partitioning and sparse matrix ordering library
- Karypis, Schloegel, et al.
- 1997
(Show Context)
Citation Context ...olic factorization is a newly added feature in the latest v2.1 release. It is designed tightly around the separator tree returned from a graph partitioning type of ordering (presently we use ParMeTiS =-=[20]-=-), and works only on power-of-two processors. We first re-distribute the graph of A onto the largest 2 q number of processors which is smaller than the total Np processors, then perform parallel symbo... |

103 | SuperLU-DIST: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems
- Li, Demmel
- 2003
(Show Context)
Citation Context ...erLU DIST) is designed for distributed memory parallel processors, using MPI [28] for interprocess communication. It can effectively use hundreds of parallel processors on sufficiently large matrices =-=[25, 26]-=-. Table 1.1 summarizes the current status of the software. All the routines are implemented in C, with parallel extensions using Pthreads (POSIX threads for shared-memory programming) or MPI (for dist... |

90 | ILUT: a dual threshold incomplete LU factorization
- SAAD
- 1994
(Show Context)
Citation Context ... SuperLU version 4.0, we provide the ILU routines to be used as preconditioners for iterative solvers. Our ILU method can be considered to be a variant of the ILUTP method originally proposed by Saad =-=[31]-=-, which combines a dual dropping strategy with numerical pivoting (“T” stands for threshold, and “P” stands for pivoting). We adapted the classic dropping strategies of ILUTP in order to incorporate s... |

85 |
Solving sparse linear systems with sparse backward error
- Arioli, Demmel, et al.
- 1989
(Show Context)
Citation Context ...ues for several machines. is an estimated bound on the error �x ∗ − x�∞/�x�∞, where x ∗ is the true solution. For further details on error analysis and error bounds estimation, see [1, Chapter 4] and =-=[2]-=-. 2.10.3 Performance-tuning parameters SuperLU chooses such machine-dependent parameters as block size by calling an inquiry function sp ienv(), which may be set to return different values on differen... |

81 |
The chaco user’s guide
- Hendrickson, Leland
- 1993
(Show Context)
Citation Context ...brary will use the permutation vector perm c[] as an input, which may be obtained from any other ordering algorithm. For example, the nesteddissection type of ordering codes include Metis [18], Chaco =-=[16]-=- and Scotch [29]. Alternatively, the users can provide their own column permutation vector. For example, it may be an ordering suitable for the underlying physical problem. Both driver routines dgssv ... |

74 | The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
- Duff, Koster
- 1999
(Show Context)
Citation Context ...quired. So SuperLU DIST uses a new scheme called static pivoting instead. In static pivoting the pivot order (Pr) is chosen before numerical factorization, using a weighted perfect matching algorithm =-=[9]-=-, and kept fixed during factorization. Since both row and column orders (Pr and Pc) are fixed before numerical factorization, we can extensively optimize the data layout, load balance, and communicati... |

73 |
The chaco user’s guide, version 1.0
- Hendrickson, Leland
- 1993
(Show Context)
Citation Context ...brary will use the permutation vector perm c[] as an input, which may be obtained from any other ordering algorithm. For example, the nesteddissection type of ordering codes include Metis [19], Chaco =-=[16]-=- and Scotch [30]. Alternatively, the users can provide their own column permutation vector. For example, it may be an ordering suitable for the underlying physical problem. Both driver routines dgssv ... |

72 |
Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides, Numerische Mathematik 6
- Oettli, Prager
- 1964
(Show Context)
Citation Context ...al matrixvector multiplication. The purpose of this stopping criterion is explained in the next section. 1.3.7 Error Bounds Step 7 of the expert driver algorithm computes error bounds. It is shown in =-=[2, 29]-=- that BERR defined in Equation 1.1 measures the componentwise relative backward error of the computed solution. This means that the computed x satisfies a slightly perturbed linear system of equations... |

69 | An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination
- Demmel, Gilbert, et al.
- 1999
(Show Context)
Citation Context ...s to the serial SuperLU, mostly related to the matrix data structures and memory organization. All these changes are summarized in Table 3.1 and their impacts on performance are studied thoroughly in =-=[6, 23]-=-. In this part of the Users’ Guide, we describe only the changes that the user should be aware of. Other than these differences, most of the material in chapter 2 is still applicable. Construct Parall... |

40 |
Symbolic factorization for sparse gaussian elimination with partial pivoting
- George, Ng
- 1987
(Show Context)
Citation Context ...us memory first. To overcome this problem, we exploited the observation that the nonzero structure for L is contained in that of the Householder matrix H from the Householder sparse QR transformation =-=[11, 12]-=-. Furthermore, it can be shown that a fundamental supernode of L is always contained in a fundamental supernode of H. This containment property is true for for any row permutation Pr in PrA = LU. Ther... |

38 |
Iterative refinement implies numerical stability for Gaussian elimination
- SKEEL
- 1980
(Show Context)
Citation Context ... solution. This means that the computed x satisfies a slightly perturbed linear system of equations (A + E)x = b + f, where |Eij| ≤ BERR · |Aij| and |fi| ≤ BERR · |bi| for all i and j. It is shown in =-=[2, 32]-=- that one step of iterative refinement usually reduces BERR to near machine epsilon. For example, if BERR is 4 times machine epsilon, then the computed solution x is identical to the solution one woul... |

36 | Sparse Gaussian Elimination on High Performance Computers
- Li
- 1996
(Show Context)
Citation Context ...(caches) and parallelism. In this introduction we refer to all three libraries collectively as SuperLU. The three libraries within SuperLU are as follows. Detailed references are also given (see also =-=[23]-=-). • Sequential SuperLU is designed for sequential processors with one or more layers of memory hierarchy (caches) [5]. • Multithreaded SuperLU (SuperLU MT) is designed for shared memory multiprocesso... |

34 | Making sparse Gaussian elimination scalable by static pivoting
- Li, Demmel
- 1998
(Show Context)
Citation Context ...erLU DIST) is designed for distributed memory parallel processors, using MPI [28] for interprocess communication. It can effectively use hundreds of parallel processors on sufficiently large matrices =-=[25, 26]-=-. Table 1.1 summarizes the current status of the software. All the routines are implemented in C, with parallel extensions using Pthreads (POSIX threads for shared-memory programming) or MPI (for dist... |

33 | Predicting Structure in Nonsymmetric Sparse Matrix Factorizations
- Gilbert, Ng
- 1994
(Show Context)
Citation Context ...puting it. 3. Reuse Pc, Pr and data structures allocated for L and U. If Pr and Pc do not change, then the work of building the data structures associated with L and U (including the elimination tree =-=[14]-=-) can be avoided. This is most useful when A (2) has the same sparsity structure and similar numerical entries as A (1) . When the numerical entries are not similar, one can still use this option, but... |

26 | A scalable sparse direct solver using static pivoting - Li, Demmel - 1999 |

20 |
An overview of SuperLU: algorithms, implementation, and user interface
- Li
(Show Context)
Citation Context ...YES must be set. Note that, when a diagonal entry is smaller than the threshold, the code will still choose an off-diagonal pivot. That is, the row permutation Pr may not be Identity. Please refer to =-=[24]-=- for more discussion on the symmetric mode. 2.7 Memory management for L and U In the sparse LU algorithm, the amount of space needed to hold the data structures of L and U cannot be accurately predict... |

17 | The design and use of algorithms for permuting large entries to the diagonal of sparse matrices - Du, Koster - 1999 |

14 | Computing row and column counts for sparse QR and LU factorization
- Gilbert, Li, et al.
(Show Context)
Citation Context ...he L supernodes based on the size of H supernodes. Fortunately, there exists a fast algorithm (almost linear in the number of nonzeros of A) to compute the size of H and the supernodes partition in H =-=[13]-=-. In practice, the above static prediction is fairly tight for most problems. However, for some others, the number of nonzeros in H greatly exceeds the number of nonzeros in L. To handle this situatio... |

13 |
A data structure for sparse QR and LU factorizations
- George, Liu, et al.
- 1988
(Show Context)
Citation Context ...RT("Malloc fails for xa[]."); s = 19.0; u = 21.0; p = 16.0; e = 5.0; r = 18.0; l = 12.0; a[0] = s; a[1] = l; a[2] = l; a[3] = u; a[4] = l; a[5] = l; a[6] = u; a[7] = p; a[8] = u; a[9] = e; a[10]= u; a=-=[11]-=-= r; asub[0] = 0; asub[1] = 1; asub[2] = 4; asub[3] = 1; asub[4] = 2; asub[5] = 4; asub[6] = 0; asub[7] = 2; asub[8] = 0; asub[9] = 3; asub[10]= 3; asub[11]= 4; xa[0] = 0; xa[1] = 3; xa[2] = 6; xa[3] ... |

8 |
Algorithm 674: Fortran codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation
- Higham
- 1989
(Show Context)
Citation Context ... = B. • dgscon(): Estimate condition number. Given the matrix A and its factors L and U, this estimates the condition number in the one-norm or infinity-norm. The algorithm is due to Hager and Higham =-=[17]-=-, and is the same as CONDEST in sparse Matlab. 26s• dgsequ()/dlaqgs(): Equilibrate. dgsequ first computes the row and column scalings Dr and Dc which would make each row and each column of the scaled ... |

5 | Parallel symbolic factorization for sparse LU with static pivoting
- Grigori, Demmel, et al.
(Show Context)
Citation Context ...h is smaller than the total Np processors, then perform parallel symbolic factorization, and finally re-populate the {L\U} structure to all Np processors. The algorithm and performance was studied in =-=[15]-=-. To invoke parallel symbolic factorization, the user needs to set the two fields of the options argument as follows: options.ParSymbFact = YES options.ColPerm = PARMETIS; 56s(1) Perform row/column eq... |

5 |
Scotch 3.4 User’s guide
- Pellegrini
- 2001
(Show Context)
Citation Context ...he permutation vector perm c[] as an input, which may be obtained from any other ordering algorithm. For example, the nesteddissection type of ordering codes include Metis [19], Chaco [16] and Scotch =-=[30]-=-. Alternatively, the users can provide their own column permutation vector. For example, it may be an ordering suitable for the underlying physical problem. Both driver routines dgssv and dgssvx take ... |

2 | Computing row and column counts for sparse QR factorization. Talk presented at - Gilbert, Ng, et al. - 1994 |

2 | A supernodal approach to imcomplete LU factorization with partial pivoting - Li, Shao |

2 |
Scotch and libScotch 5.1 User’s Guide (version 5.1.1
- Pellegrini
(Show Context)
Citation Context ...he permutation vector perm c[] as an input, which may be obtained from any other ordering algorithm. For example, the nesteddissection type of ordering codes include Metis [18], Chaco [16] and Scotch =-=[29]-=-. Alternatively, the users can provide their own column permutation vector. For example, it may be an ordering suitable for the underlying physical problem. Both driver routines dgssv and dgssvx take ... |

1 |
Serial and parallel software packages for partitioning unstructured graphs and for computing fill-reducing orderings of sparse matrices
- Karypis, Kumar
(Show Context)
Citation Context ...it formation of A T A. It usually gives comparable orderings as MMD on A T A, and is faster. The orderings based on graph partitioning heuristics are also popular, as exemplified in the MeTiS package =-=[21]-=-. The user can simply input this ordering in the permutation vector for Pc. Note 7sthat many graph partitioning algorithms are designed for symmetric matrices. The user may still apply them to the str... |

1 | Iterative re implies numerical stability for Gaussian elimination - Skeel - 1980 |