Results 1  10
of
13
On the solution of equality constrained quadratic programming problems arising . . .
, 1998
"... ..."
Multifrontal QR factorization in a multiprocessor environment
, 1994
"... We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix A T A. Uniprocessor issues are first addressed. We then discuss the multiprocessor implementation of the method. Parallelization of both the factorization phase and the solve phase are considered. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant performance improvement. The eight processor Alliant FX/80 is used to illustrate our discussion. 1 ENSEEIHTIRIT (Toulouse, France), amestoy@enseeiht.fr. 2 CERFACS (Toulouse, France) also Rutherford App leton Lab., (England), duff@cerfac...
Finding Good Column Orderings for Sparse QR Factorization
 In Second SIAM Conference on Sparse Matrices
, 1996
"... For sparse QR factorization, finding a good column ordering of the matrix to be factorized, is essential. Both the amount of fill in the resulting factors, and the number of floatingpoint operations required by the factorization, are highly dependent on this ordering. A suitable column ordering of ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
For sparse QR factorization, finding a good column ordering of the matrix to be factorized, is essential. Both the amount of fill in the resulting factors, and the number of floatingpoint operations required by the factorization, are highly dependent on this ordering. A suitable column ordering of the matrix A is usually obtained by minimum degree analysis on A T A. The objective of this analysis is to produce low fill in the resulting triangular factor R. We observe that the efficiency of sparse QR factorization is also dependent on other criteria, like the size and the structure of intermediate fill, and the size and the structure of the frontal matrices for the multifrontal method, in addition to the amount of fill in R. An important part of this information is lost when A T A is formed. However, the structural information from A is important to consider in order to find good column orderings. We show how a suitable equivalent reordering of an initial fillreducing ordering can...
Multifrontal multithreaded rankrevealing sparse QR factorization
"... SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fillreducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rankdetection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.
Multifrontal Computation with the Orthogonal Factors of Sparse Matrices
 SIAM Journal on Matrix Analysis and Applications
, 1994
"... . This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented. A theoretical operation count for the K by K unbordered grid model problem and problems defined on graphs with p nseparators shows that the proposed method requires O(NR ) storage and multiplications to compute Q T b, where NR = O(n log n) is the number of nonzeros of the upper triangular factor R of A. In order to introduce BLAS2 operations, Schreiber and Van Loan's StorageEfficientWY Representation [SIAM J. Sci. Stat. Computing, 10(1989),pp. 5557] is applied for the orthogonal factor Q i of each frontal matrix F i . If this technique is used, the bound on storage increases to O(n(logn) 2 ). Some numerical results for the grid model problems as well as HarwellBoeing problems...
A Blocked Implementation of Level 3 BLAS for RISC Processors
, 1996
"... We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they describe a similar approach ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they describe a similar approach for efficient serial and parallel implementations of Level 3 BLAS on shared and virtual shared memory multiprocessors. All our codes are written in Fortran and use loopunrolling, blocking, and copying to improve the performance. A blocking technique is used to express the BLAS in terms of operations involving triangular blocks and calls to the matrixmatrix multiplication kernel (GEMM). No manufacturersupplied or assembler code is used. This blocked implementation uses the same blocking ideas as in Dayd'e et al. (1994) except that the ordering of loops is designed for efficient reuse of data held in cache and not necessarily for parallelization. A parameter which controls the bloc...
A Parallel Sparse QRFactorization Algorithm
, 1995
"... . A sparse QRfactorization algorithm for coarsegrain parallel computations is described. Initially the coefficient matrix, which is assumed to be general sparse, is reordered properly in an attempt to bring as many zero elements in the lower left corner as possible. Then the matrix is partitio ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
. A sparse QRfactorization algorithm for coarsegrain parallel computations is described. Initially the coefficient matrix, which is assumed to be general sparse, is reordered properly in an attempt to bring as many zero elements in the lower left corner as possible. Then the matrix is partitioned into large blocks of rows and Givens rotations are applied in each block. These are independent tasks and can be done in parallel. Row and column permutations are carried out within the blocks to exploit the sparsity of the matrix. The algorithm can be used for solving least squares problems either directly or combined with an appropriate iterative method (for example, the preconditioned conjugate gradients). In the latter case, dropping of numerically small elements is performed during the factorization stage, which often leads to a better preservation of sparsity and a faster factorization, but this also leads to a loss of accuracy. The iterative method is used to regain the ...
Computing sparse orthogonal factors in MATLAB
, 1998
"... In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the mu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the multifrontal structure to store this orthogonal factor in a compact way. A new data class with overloaded operators is implemented in Matlab to provide an easy usage of the compact orthogonal factors. This implicit way of storing the orthogonal factor also results in faster computation and application of Q and Q T . Examples are given, where the new version is up to four times faster when computing only R and up to 1000 times faster when computing both Q and R, than the builtin function qr in Matlab. The sqr package is available at URL: http://www.mai.liu.se/~milun/sls/. Key words: QR factorization, sparse problems, multifrontal method, orthogonal factorization. 1 Introduction. Let A 2 IR...
A Projection Method for the Solution of Rectangular Systems
, 1996
"... We present a general method for the linear leastsquares solution of overdetermined and underdetermined systems. The method is particularly efficient when the coefficient matrix is quasisquare, that is when the number of rows and number of columns is almost the same. The numerical methods proposed ..."
Abstract
 Add to MetaCart
We present a general method for the linear leastsquares solution of overdetermined and underdetermined systems. The method is particularly efficient when the coefficient matrix is quasisquare, that is when the number of rows and number of columns is almost the same. The numerical methods proposed in the literature for linear leastsquares problems and minimumnorm solutions do not generally take account of this special characteristic. The proposed method is based on an LU factorization of the original quasisquare matrix A, assuming that A has full rank. In the overdetermined case, the LU factors are used to compute a basis for the null space of A T . The righthand side vector b is then projected onto this subspace and the leastsquares solution is obtained from the solution of this reduced problem. In the case of underdetermined systems, the desired solution is again obtained through the solution of a reduced system. The use of this method may lead to important savings in comput...
A CoarseGrained Parallel QRFactorization Algorithm for Sparse Least Squares Problems
"... A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned ..."
Abstract
 Add to MetaCart
A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned into block rows, and Givens plane rotations are applied in each blockrow. These are independent tasks and can be done in parallel. Row and column permutations are carried out within the diagonal blocks in an attempt to preserve better the sparsity of the matrix. The algorithm can be used for solving least squares problems either directly or combined with an iterative method (preconditioned conjugate gradients are used). Small nonzero elements can optionally be dropped in the latter case. This leads to a better preservation of the sparsity and, therefore, to a faster factorization. The price which has to be paid is some loss of accuracy. The iterative method is used to regain the...