## Multifrontal multithreaded rank-revealing sparse QR factorization

Citations: | 10 - 2 self |

### BibTeX

@MISC{Davis_multifrontalmultithreaded,

author = {Timothy A. Davis},

title = {Multifrontal multithreaded rank-revealing sparse QR factorization},

year = {}

}

### OpenURL

### Abstract

SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fill-reducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rank-detection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.

### Citations

793 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...and Hager 2009] for its ordering and analysis phase, and thus can use any ordering method available to CHOLMOD: COLAMD on A, AMD 1 on the explicit pattern of A T A [Amestoy et al. 1996; 2004], METIS [=-=Karypis and Kumar 1998-=-] applied to A T A, CHOLMOD’s nested dissection ordering based on METIS, or any combination of the above, where multiple methods can be tried and the ordering with the least fill in the Cholesky facto... |

741 |
A set of Level 3 Basic Linear Algebra Subprograms
- Dongarra, Croz, et al.
- 1990
(Show Context)
Citation Context ...explicit thread-based software in SuiteSparseQR. The second opportunity arises within each frontal matrix. The factorization of a frontal matrix relies on LAPACK, which in turn uses the Level-3 BLAS [=-=Dongarra et al. 1990-=-]. Most of the vendor-supplied BLAS, such as the Intel Math Kernel Library (MKL) BLAS, the AMD ACML BLAS, the Sun Performance Library BLAS, as well as the Goto BLAS [Goto and van de Geijn 2008] can ex... |

599 |
Numerical Methods for Least Squares Problems
- Björck
- 1996
(Show Context)
Citation Context ...tion of an under-determined system Ax = b, or to find a minimum 2-norm solution of an under-determined system A T x = b. The earliest sparse direct methods operated on A one row or column at a time ([=-=Björck 1996-=-; George et al. 1988; George and Heath 1980; Heath 1982; Heath and Sorensen 1986]; see also [Davis 2006]). These methods are unable to reach a substantial fraction of the theoretical peak performance ... |

499 |
Computer solution of large sparse positive definite systems
- George, Liu
- 1981
(Show Context)
Citation Context ... in the column elimination tree, and prior to the rotation, the nonzero pattern of Ri∗ is a subset of Rj∗ (excluding i itself), assuming R has the same pattern as the Cholesky factorization of A T A [=-=George and Liu 1981-=-]. Thus, this rotation causes no fill-in in Rj∗, and causes the updated row i to take on the same nonzero pattern of Rj∗ (excluding the entries rii and rij, which have been set to zero). Let k be the ... |

352 | ScaLAPACK Users’ Guide - Blackford, Choi, et al. - 1997 |

300 | The University of Florida sparse matrix collection - Davis, Hu |

254 |
Efficiency of a Good But Not Linear Set Union Algorithm
- Tarjan
(Show Context)
Citation Context ...f L. The time taken by this step is almost O(|A|), where “almost” is theoretically at worst O(|A| log n). If CHOLMOD were to use a theoretically optimal implementation of the disjoint set-union-find [=-=Tarjan 1975-=-], this time would be O(|A|α(|A|, n)) where α is the inverse Ackermann function, a very slowly growing function. CHOLMOD uses a method that is faster in practice but theoretically bounded by O(|A| log... |

252 | An approximate minimum degree ordering algorithm
- Amestoy, Davis, et al.
- 1996
(Show Context)
Citation Context ...ation [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; =-=Amestoy et al. 1996-=-; Lu and Barlow 1996; Sun 1996; Pierce and Lewis 1997; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of these prior methods, and its u... |

215 |
The multifrontal solution of indefinite sparse symmetric linear systems
- Duff, Reid
- 1983
(Show Context)
Citation Context ...n method. In the multifrontal method, the factorization of a large sparse matrix is performed in a sequence of small dense frontal matrices; the idea was first used for symmetric indefinite matrices [=-=Duff and Reid 1983-=-] and later extended to sparse LU [Duff and Reid 1984; Davis and Duff 1997; Davis 2004] and sparse Cholesky factorization [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse ... |

181 |
Generalized nested dissection
- Lipton, Rose, et al.
- 1979
(Show Context)
Citation Context ...rmally very straightforward for any multifrontal sparse QR to adapt to any fill-reducing ordering. MA49 uses AMD on the explicit matrix A T A. [Lu and Barlow 1996] use nested dissection [George 1973; =-=Lipton et al. 1979-=-], which also requires A T A. [Pierce and Lewis 1997] and [Matstoms 1994] do not discuss the fill-reducing ordering, but their methods were developed before A T A-free methods such as COLAMD and the c... |

177 |
Nested dissection of regular finite element mesh
- George
- 1973
(Show Context)
Citation Context ...ons. It is normally very straightforward for any multifrontal sparse QR to adapt to any fill-reducing ordering. MA49 uses AMD on the explicit matrix A T A. [Lu and Barlow 1996] use nested dissection [=-=George 1973-=-; Lipton et al. 1979], which also requires A T A. [Pierce and Lewis 1997] and [Matstoms 1994] do not discuss the fill-reducing ordering, but their methods were developed before A T A-free methods such... |

129 | Sparse matrices in MATLAB: design and implementation
- Gilbert, Moler, et al.
- 1992
(Show Context)
Citation Context ...ept in Householder form. The minimum 2-norm solution is x=Q*(R’\(P’*b)). It is referred to as MA49:default. —MATLAB backslash, or x=A\b in MATLAB (R2008b or earlier). It uses the implementation from [=-=Gilbert et al. 1992-=-] of the Givens-based method of [George and Heath 1980; Heath 1982]. By default, P is found via COLMMD, but this is replaced in these experiments with AMD or COLAMD. (1) If m ≥ n, A*P=Q*R is factorize... |

117 | An unsymmetric-pattern multifrontal method for sparse LU factorization
- Davis, Duff
- 1994
(Show Context)
Citation Context ... matrix is performed in a sequence of small dense frontal matrices; the idea was first used for symmetric indefinite matrices [Duff and Reid 1983] and later extended to sparse LU [Duff and Reid 1984; =-=Davis and Duff 1997-=-; Davis 2004] and sparse Cholesky factorization [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR fa... |

99 |
Direct methods for sparse linear systems
- DAVIS
- 2006
(Show Context)
Citation Context ...ion of an under-determined system, or to find a minimum 2-norm solution of an under-determined system. The earliest sparse direct methods operated on A one row or column at a time (see an overview in =-=[3]-=-). These methods are unable to reach a substantial fraction of the theoretical peak performance of modern computers because of their irregular access of memory. The row-merging method [4] introduced t... |

89 | Intel threading building blocks - outfitting C++ for multi-core processor parallelism. O’Reilly - Reinders - 2007 |

81 |
Computing the block triangular form of a sparse matrix
- Pothen, Fan
- 1990
(Show Context)
Citation Context ...ular form (BTF). Finding singletons in A is much faster than a Dulmage-Mendelsohn decomposition such as dmperm in MATLAB [Davis 2006] or a permutation to block triangular form (BTF) [Duff 1977; 1981; =-=Pothen and Fan 1990-=-]. Finding singletons or the BTF takes far less time than the numerical factorization itself, however. Each singleton is a 1-by-1 block in BTF of A, if A is not rank-deficient. MA49 uses the BTF of [P... |

78 |
The multifrontal solution of unsymmetric sets of linear systems
- Duff, Reid
- 1984
(Show Context)
Citation Context ...on of a large sparse matrix is performed in a sequence of small dense frontal matrices; the idea was first used for symmetric indefinite matrices [Duff and Reid 1983] and later extended to sparse LU [=-=Duff and Reid 1984-=-; Davis and Duff 1997; Davis 2004] and sparse Cholesky factorization [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of mult... |

78 |
Numerical methods for solving linear least squares problems, Numerische Mathematik 7
- Golub
- 1965
(Show Context)
Citation Context ...cing column permutation, then the QR factorization is AP = QR. The solution to a least-squares problem (minx [ ] ||b − Ax||2) is given by solving R P 0 Tx = QTb, or x=P*(R\(Q’*b)) in MATLAB notation [=-=Golub 1965-=-]. If given the right-hand-side b when A is factorized, SuiteSparseQR applies the Householder reflections to b while factorizing A, giving c = QTb when the frontal matrix factorization is complete. Th... |

64 | A Storage-Efficient WY Representation for Products of Householder Transformations - Schreiber, Loan |

58 | A column pre-ordering strategy for the unsymmetric-pattern multifrontal method - Davis |

55 | Algorithm 887: Cholmod, Supernodal Sparse Cholesky Factorization and Update/Downdate
- Chen, Davis, et al.
- 2008
(Show Context)
Citation Context ...As the elimination proceeds, new cliques are formed, and these are also held as a list of the nodes in the cliques, just as each row of A represents a clique in the graph. SuiteSparseQR uses CHOLMOD [=-=Chen et al. 2009-=-; Davis and Hager 2009] for its ordering and analysis phase, and thus can use any ordering method available to CHOLMOD: COLAMD on A, AMD 1 on the explicit pattern of A T A [Amestoy et al. 1996; 2004],... |

51 | On algorithms for obtaining a maximum transversal - Duff - 1981 |

51 | Unitary triangularization of a nonsymmetric matrix
- Householder
- 1958
(Show Context)
Citation Context ... of the key direct methods for solving large sparse linear systems and least-squares problems. Typically, orthogonal transformations such as Givens rotations [Givens 1958] or Householder reflections [=-=Householder 1958-=-] are applied to A (or a permuted matrix AP), resulting in the factorization A = QR or AP = QR. The resulting factors can be used to solve a least-squares problem, to find the basic solution of an und... |

48 |
Solution of sparse linear least squares problems using Givens rotations. Linear Algebra and its Applications
- George, Heath
- 1980
(Show Context)
Citation Context ...m Ax = b, or to find a minimum 2-norm solution of an under-determined system A T x = b. The earliest sparse direct methods operated on A one row or column at a time ([Björck 1996; George et al. 1988; =-=George and Heath 1980-=-; Heath 1982; Heath and Sorensen 1986]; see also [Davis 2006]). These methods are unable to reach a substantial fraction of the theoretical peak performance of modern computers Dept. of Computer and I... |

42 | Modifying a sparse Cholesky factorization
- Davis, Hager
- 1999
(Show Context)
Citation Context ...g of the factor R computed so far, if a column is found to be linearly dependent. The structure of the update/downdate is identical to a sparse update/downdate of the Cholesky factorization of A T A [=-=Davis and Hager 1999-=-; 2001; 2009; Chen et al. 2009]. The column that needs to be removed may not be the current column being eliminated; thus the need for a dynamic restructuring of the existing R. The nonzero pattern of... |

41 | Algorithm 837: AMD, an approximate minimum degree ordering algorithm - Amestoy, Davis, et al. |

35 |
Sparse QR Factorization in MATLAB
- Matstoms
- 1994
(Show Context)
Citation Context ...rse Cholesky factorization [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [=-=Matstoms 1994-=-; 1995; Amestoy et al. 1996; Lu and Barlow 1996; Sun 1996; Pierce and Lewis 1997; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of the... |

28 | Multifrontal QR Factorization in a Multiprocessor environment
- Amestoy, Duff, et al.
- 1996
(Show Context)
Citation Context ...ation [Liu 1989]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; =-=Amestoy et al. 1996-=-; Lu and Barlow 1996; Sun 1996; Pierce and Lewis 1997; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of these prior methods, and its u... |

25 |
Predicting fill for sparse orthogonal factorization
- COLEMAN, EDENBRANDT, et al.
- 1986
(Show Context)
Citation Context ...sis phase depends solely on the nonzero pattern of A (or A22, more precisely). If the matrix A is strong Hall, the nonzero pattern of the factor R is identical to the Cholesky factorization of A T A [=-=Coleman et al. 1986-=-; George and Heath 1980]. A strong-Hall matrix is a matrix that cannot be permuted into block upper triangular form. Structurally rank-deficient matrices are never strong-Hall, but non-strongHall matr... |

25 |
Computation of plane unitary rotations transforming a general matrix to triangular form
- Givens
- 1958
(Show Context)
Citation Context ...TRODUCTION Sparse QR factorization is one of the key direct methods for solving large sparse linear systems and least-squares problems. Typically, orthogonal transformations such as Givens rotations [=-=Givens 1958-=-] or Householder reflections [Householder 1958] are applied to A (or a permuted matrix AP), resulting in the factorization A = QR or AP = QR. The resulting factors can be used to solve a least-squares... |

22 | Using OpenMP: Portable Shared Memory Parallel Programming - Chapman, vanderPas, et al. - 2007 |

21 | Dynamic supernodes in sparse Cholesky update/downdate and triangular solves
- Davis, Hager
- 2009
(Show Context)
Citation Context ... proceeds, new cliques are formed, and these are also held as a list of the nodes in the cliques, just as each row of A represents a clique in the graph. SuiteSparseQR uses CHOLMOD [Chen et al. 2009; =-=Davis and Hager 2009-=-] for its ordering and analysis phase, and thus can use any ordering method available to CHOLMOD: COLAMD on A, AMD 1 on the explicit pattern of A T A [Amestoy et al. 1996; 2004], METIS [Karypis and Ku... |

20 | Algorithm 836: COLAMD, a column approximate minimum degree ordering algorithm - Davis, Gilbert, et al. - 2004 |

20 | Sparse multifrontal rank revealing qr factorization
- Pierce, Lewis
- 1997
(Show Context)
Citation Context ... multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; Amestoy et al. 1996; Lu and Barlow 1996; Sun 1996; =-=Pierce and Lewis 1997-=-; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of these prior methods, and its unique features are discussed here. Additional backgro... |

19 |
Incremental condition estimation for sparse matrices
- Bischof, Lewis, et al.
- 1990
(Show Context)
Citation Context ...so handles rank deficient matrices, in contrast to all but one prior sparse multifrontal QR factorization method ([Pierce and Lewis 1997]). Their method uses a sparse incremental condition estimator [=-=Bischof et al. 1990-=-], and restricted column pivoting. The method is very accurate, but it requires a dynamic updating/downdating of the factor R computed so far, if a column is found to be linearly dependent. The struct... |

18 |
The multifrontal method and paging in sparse Cholesky factorization
- Liu
- 1989
(Show Context)
Citation Context ...the idea was first used for symmetric indefinite matrices [Duff and Reid 1983] and later extended to sparse LU [Duff and Reid 1984; Davis and Duff 1997; Davis 2004] and sparse Cholesky factorization [=-=Liu 1989-=-]. [Puglisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; Amestoy et al. ... |

16 |
QR Factorization of large sparse overdetermined and square matrices using a multifrontal method in a multiprocessor environment
- Puglisi
- 1993
(Show Context)
Citation Context ... first used for symmetric indefinite matrices [Duff and Reid 1983] and later extended to sparse LU [Duff and Reid 1984; Davis and Duff 1997; Davis 2004] and sparse Cholesky factorization [Liu 1989]. [=-=Puglisi 1993-=-] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; Amestoy et al. 1996; Lu and Bar... |

15 |
On general row merging schemes for sparse givens transformation
- Liu
- 1986
(Show Context)
Citation Context ...atical Software, Vol. V, No. N, M 20YY, Pages 1–26.2 · T. A. Davis because of their irregular access of memory, although they are very competitive when R remains very sparse. The row-merging method [=-=Liu 1986-=-] introduced the idea that groups of rows could be handled all at once. This idea was fully realized in the sparse multifrontal QR factorization method. In the multifrontal method, the factorization o... |

14 | Parallel sparse qr factorization on shared memory architectures, Parallel Computing 21 - Matstoms - 1995 |

13 |
A data structure for sparse QR and LU factorizations
- George, Liu, et al.
- 1988
(Show Context)
Citation Context ...der-determined system Ax = b, or to find a minimum 2-norm solution of an under-determined system A T x = b. The earliest sparse direct methods operated on A one row or column at a time ([Björck 1996; =-=George et al. 1988-=-; George and Heath 1980; Heath 1982; Heath and Sorensen 1986]; see also [Davis 2006]). These methods are unable to reach a substantial fraction of the theoretical peak performance of modern computers ... |

13 | Computing row and column counts for sparse QR and LU factorization
- GILBERT, LI, et al.
- 2001
(Show Context)
Citation Context ... A. [Pierce and Lewis 1997] and [Matstoms 1994] do not discuss the fill-reducing ordering, but their methods were developed before A T A-free methods such as COLAMD and the column-count algorithm of [=-=Gilbert et al. 2001-=-] were developed. Edlund uses optionally uses COLAMD [Edlund 2002], but does not use the column-count method in [Gilbert et al. 2001]. 2.3 Symbolic factorization CHOLMOD then performs the symbolic sup... |

13 | High-performance implementation of the level-3 - GOTO, GEIJN |

11 | Multiple-Rank modifications of a sparse Cholesky factorization - DAVIS, HAGER - 2001 |

11 |
Parallel sparse orthogonal factorization on distributed-memory multiprocessors
- Sun
- 1996
(Show Context)
Citation Context ...tended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; Amestoy et al. 1996; Lu and Barlow 1996; =-=Sun 1996-=-; Pierce and Lewis 1997; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of these prior methods, and its unique features are discussed h... |

10 |
Some extensions of an algorithm for sparse linear least squares problems
- Heath
- 1982
(Show Context)
Citation Context ...minimum 2-norm solution of an under-determined system A T x = b. The earliest sparse direct methods operated on A one row or column at a time ([Björck 1996; George et al. 1988; George and Heath 1980; =-=Heath 1982-=-; Heath and Sorensen 1986]; see also [Davis 2006]). These methods are unable to reach a substantial fraction of the theoretical peak performance of modern computers Dept. of Computer and Information S... |

9 | Multifrontal computation with the orthogonal factors of sparse matrices
- Lu, Barlow
- 1996
(Show Context)
Citation Context ...glisi 1993] first extended the multifrontal approach to sparse QR factorization; other prior implementations of multifrontal sparse QR factorization include [Matstoms 1994; 1995; Amestoy et al. 1996; =-=Lu and Barlow 1996-=-; Sun 1996; Pierce and Lewis 1997; Edlund 2002]. SuiteSparseQR is a multithreaded multifrontal sparse QR factorization method that is an extension of these prior methods, and its unique features are d... |

6 |
On permutations to block triangular form
- Duff
- 1977
(Show Context)
Citation Context ...n to block triangular form (BTF). Finding singletons in A is much faster than a Dulmage-Mendelsohn decomposition such as dmperm in MATLAB [Davis 2006] or a permutation to block triangular form (BTF) [=-=Duff 1977-=-; 1981; Pothen and Fan 1990]. Finding singletons or the BTF takes far less time than the numerical factorization itself, however. Each singleton is a 1-by-1 block in BTF of A, if A is not rank-deficie... |

5 | Algorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse qr factorization package - Davis - 2008 |

5 | A software package for sparse orthogonal factorization and updating - Edlund - 2002 |

3 |
A pipelined Givens method for computing the QR factorization of a sparse matrix
- Heath, T, et al.
- 1986
(Show Context)
Citation Context ...rm solution of an under-determined system A T x = b. The earliest sparse direct methods operated on A one row or column at a time ([Björck 1996; George et al. 1988; George and Heath 1980; Heath 1982; =-=Heath and Sorensen 1986-=-]; see also [Davis 2006]). These methods are unable to reach a substantial fraction of the theoretical peak performance of modern computers Dept. of Computer and Information Science and Engineering, U... |