Results 1  10
of
13
Multifrontal QR factorization in a multiprocessor environment
, 1994
"... We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix A T A. Uniprocessor issues are first addressed. We then discuss the multiprocessor implementation of the method. Parallelization of both the factorization phase and the solve phase are considered. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant performance improvement. The eight processor Alliant FX/80 is used to illustrate our discussion. 1 ENSEEIHTIRIT (Toulouse, France), amestoy@enseeiht.fr. 2 CERFACS (Toulouse, France) also Rutherford App leton Lab., (England), duff@cerfac...
Incomplete Factorization Preconditioning For Linear Least Squares Problems
, 1994
"... this paper is the modified version of GramSchmidt orthogonalization with a rejection test applied right after the formation of the offdiagonal elements of the factor R. For a given rejection parameter 0 / 1, the rejection test is: if r ij ! /= k a ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
this paper is the modified version of GramSchmidt orthogonalization with a rejection test applied right after the formation of the offdiagonal elements of the factor R. For a given rejection parameter 0 / 1, the rejection test is: if r ij ! /= k a
Multifrontal multithreaded rankrevealing sparse QR factorization
"... SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fillreducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rankdetection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.
Multifrontal Computation with the Orthogonal Factors of Sparse Matrices
 SIAM Journal on Matrix Analysis and Applications
, 1994
"... . This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented. A theoretical operation count for the K by K unbordered grid model problem and problems defined on graphs with p nseparators shows that the proposed method requires O(NR ) storage and multiplications to compute Q T b, where NR = O(n log n) is the number of nonzeros of the upper triangular factor R of A. In order to introduce BLAS2 operations, Schreiber and Van Loan's StorageEfficientWY Representation [SIAM J. Sci. Stat. Computing, 10(1989),pp. 5557] is applied for the orthogonal factor Q i of each frontal matrix F i . If this technique is used, the bound on storage increases to O(n(logn) 2 ). Some numerical results for the grid model problems as well as HarwellBoeing problems...
Dealing with Dense Rows in the Solution of Sparse Linear Least Squares Problems
, 1995
"... Sparse linear least squares problems containing a few relatively dense rows occur frequently in practice. Straightforward solution of these problems could cause catastrophic fill and delivers extremely poor performance. This paper studies a scheme for solving such problems efficiently by handling de ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Sparse linear least squares problems containing a few relatively dense rows occur frequently in practice. Straightforward solution of these problems could cause catastrophic fill and delivers extremely poor performance. This paper studies a scheme for solving such problems efficiently by handling dense rows and sparse rows separately. How a sparse matrix is partitioned into dense rows and sparse rows determines the efficiency of the overall solution process. A new algorithm is proposed to find a partition of a sparse matrix which leads to satisfactory or even optimal performance. Extensive numerical experiments are performed to demonstrate the effectiveness of the proposed scheme. A MATLAB implementation is included. 1 This work was supported in part by the Cornell Theory Center which receives funding from members of its Corporate Research Institute, the National Science Foundation (NSF), the Advanced Research Projects Agency (ARPA), the National Institutes of Health (NIH), New York S...
Sparse Householder QR Factorization on a Mesh
, 1996
"... In this document we are going to analyze the parallelization of QR factorization by means of Householder transformations. This parallelization will be carried out on a machine with a mesh topology (a 2D torus to be more precise). We use a cyclic distribution of the elements of the sparse matrix M w ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this document we are going to analyze the parallelization of QR factorization by means of Householder transformations. This parallelization will be carried out on a machine with a mesh topology (a 2D torus to be more precise). We use a cyclic distribution of the elements of the sparse matrix M we want to decompose over the processors. Each processor represents the nonzero elements of its part of the matrix by a onedimensional doubly linked list data structure. Then, we describe the different procedures that constitute the parallel algorithm. As an application of QR factorization, we concentrate on the least squares problem and finally we present a evaluation of the efficiency of this algorithm for a set of test matrices from the HarwellBoeing sparse matrix collection.
Parallel Multifrontal Solution Of Sparse Linear Least Squares Problems On DistributedMemory Multiprocessors
 Advanced Computing Research Institute, Center for Theory and Simulation in Science and Engineering, Cornell
, 1994
"... . We describe the issues involved in the design and implementation of efficient parallel algorithms for solving sparse linear least squares problems on distributedmemory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected seminormal equations due ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
. We describe the issues involved in the design and implementation of efficient parallel algorithms for solving sparse linear least squares problems on distributedmemory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected seminormal equations due to Bj¨orck. The major tasks involved are sparse QR factorization, sparse triangular solution and sparse matrixvector multiplication. The sparse QR factorization is accomplished by a parallel multifrontal scheme recently introduced. New parallel algorithms for solving the related sparse triangular systems and for performing sparse matrixvector multiplications are proposed. The arithmetic and communication complexities of our algorithms on regular grid problems are presented. Experimental results on an Intel iPSC/860 machine are described. Key words. parallel algorithms, sparse matrix, orthogonal factorization, multifrontal method, least squares problems, triangular solution, distributedme...
Exact Prediction Of QR FillIn By RowMerge Trees
"... . Rowmerge trees for forming the QR factorization of a sparse matrix A are closely related to elimination trees for the Cholesky factorization of A T A. Rowmerge trees predict the exact fillin (assuming no numerical cancellation) provided A satisfies the strong Hall property, but overestimates ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
. Rowmerge trees for forming the QR factorization of a sparse matrix A are closely related to elimination trees for the Cholesky factorization of A T A. Rowmerge trees predict the exact fillin (assuming no numerical cancellation) provided A satisfies the strong Hall property, but overestimates the fillin in general. However, here a fast and simple postprocessing step for rowmerge trees is presented that predicts the exact fillin for sparse QR factorization using Householder reflectors, for general matrices. Key words. rowmerge trees, elimination trees, QR factorization 1. Introduction. Matrix factorizations of sparse matrices typically result in creating further nonzero entries, or fillin. If this fillin can be accurately predicted in advance, then the factorization can be performed in less time, as the additional memory needed can be allocated once in advance. Notice that fillin can be reduced with some matrix reordering algorithms. After that, the algorithms presented ...
Computing sparse orthogonal factors in MATLAB
, 1998
"... In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the mu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the multifrontal structure to store this orthogonal factor in a compact way. A new data class with overloaded operators is implemented in Matlab to provide an easy usage of the compact orthogonal factors. This implicit way of storing the orthogonal factor also results in faster computation and application of Q and Q T . Examples are given, where the new version is up to four times faster when computing only R and up to 1000 times faster when computing both Q and R, than the builtin function qr in Matlab. The sqr package is available at URL: http://www.mai.liu.se/~milun/sls/. Key words: QR factorization, sparse problems, multifrontal method, orthogonal factorization. 1 Introduction. Let A 2 IR...
Advanced Computing Research Institute Theory Center Cornell University Semiannual Research Activity Report April 1992  September 1992
"... This report consists of two parts. The first part contains a short summary of the progress made in the last six months on each of the four main projects: parallelizing compilers, computational linear algebra, computational optimization, and numerical methods for partial differential equations. Inclu ..."
Abstract
 Add to MetaCart
This report consists of two parts. The first part contains a short summary of the progress made in the last six months on each of the four main projects: parallelizing compilers, computational linear algebra, computational optimization, and numerical methods for partial differential equations. Included also are a list of ACRI researchers and their research interests, a list of technical reports produced in the last six months, and a list of ACRI seminars. In the second part we highlight one of the projects, the parallelizing compiler work, where we give a more detailed introduction into this area and sketch our novel approach. Contents