Results 1 
9 of
9
Multifrontal QR factorization in a multiprocessor environment
, 1994
"... We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix A T A. Uniprocessor issues are first addressed. We then discuss the multiprocessor implementation of the method. Parallelization of both the factorization phase and the solve phase are considered. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant performance improvement. The eight processor Alliant FX/80 is used to illustrate our discussion. 1 ENSEEIHTIRIT (Toulouse, France), amestoy@enseeiht.fr. 2 CERFACS (Toulouse, France) also Rutherford App leton Lab., (England), duff@cerfac...
Multifrontal multithreaded rankrevealing sparse QR factorization
"... SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fillreducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rankdetection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.
Multifrontal Computation with the Orthogonal Factors of Sparse Matrices
 SIAM Journal on Matrix Analysis and Applications
, 1994
"... . This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented. A theoretical operation count for the K by K unbordered grid model problem and problems defined on graphs with p nseparators shows that the proposed method requires O(NR ) storage and multiplications to compute Q T b, where NR = O(n log n) is the number of nonzeros of the upper triangular factor R of A. In order to introduce BLAS2 operations, Schreiber and Van Loan's StorageEfficientWY Representation [SIAM J. Sci. Stat. Computing, 10(1989),pp. 5557] is applied for the orthogonal factor Q i of each frontal matrix F i . If this technique is used, the bound on storage increases to O(n(logn) 2 ). Some numerical results for the grid model problems as well as HarwellBoeing problems...
The impact of high performance Computing in the solution of linear systems: trends and problems
, 1999
"... We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in thi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in this area and speculate on what advances we might expect in the early years of the next century. Keywords: sparse matrices, direct methods, parallelism, matrix factorization, multifrontal methods. AMS(MOS) subject classifications: 65F05, 65F50. 1 Current reports available at http://www.cerfacs.fr/algor/algo reports.html. Also appeared as Technical Report RALTR1999072 from Rutherford Appleton Laboratory, Oxfordshire. 2 duff@cerfacs.fr. Also at Atlas Centre, RAL, Oxon OX11 0QX, England. Rutherford Appleton Laboratory. Contents 1 Introduction 1 2 Building blocks 1 3 Factorization of dense matrices 2 4 Factorization of sparse matrices 4 5 Parallel computation 8 6 Current situation 12 7 F...
Truncated Block Newton and quasiNewton methods for sparse systems of nonlinear equations. Experiments on parallel platforms
, 1997
"... this paper we solve them concurrently with the iterative Lanczos algorithm LSQR [6]. Some limitations in terms of speedup efficiency were detected while using this solver (see in the section 5), in that it is an essentially sequential procedure, except the BLAS2 kernels (sparse matvet). Other choic ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
this paper we solve them concurrently with the iterative Lanczos algorithm LSQR [6]. Some limitations in terms of speedup efficiency were detected while using this solver (see in the section 5), in that it is an essentially sequential procedure, except the BLAS2 kernels (sparse matvet). Other choices are possible, like the augmented system approach used in [1], or the normal equations used in [2], or the sparse QR solver [5]. Experiments with this last solver are presently under work. 3 Inexact Newton method
Inexact Block QuasiNewton Methods For Sparse Systems Of Nonlinear Equations.
, 2000
"... . In this paper we present the results obtained in solving consistent sparse systems of n nonlinear equations F (x) = 0; by a QuasiNewton method combined with a p block iterative rowprojection linear solver of Cimminotype, 1 p ø n: Under weak regularity conditions for F; it is proved that this I ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
. In this paper we present the results obtained in solving consistent sparse systems of n nonlinear equations F (x) = 0; by a QuasiNewton method combined with a p block iterative rowprojection linear solver of Cimminotype, 1 p ø n: Under weak regularity conditions for F; it is proved that this Inexact QuasiNewton method has a local, linear convergence in the energy norm induced by the preconditioned matrix HA; where A is an initial guess of the Jacobian matrix, and it may converge superlinearly too. The matrix H = [A + 1 ; : : : ; A + i ; : : : ; A + p ]; where A + i = A T i (A i A T i ) \Gamma1 is the MoorePenrose pseudo inverse of the m i \Theta n block, A i is the preconditioner. A simple partitioning of the Jacobian matrix was used for solving a set of nonlinear test problems with sizes ranging from 1024 to 131072 on the CRAY T3E under the MPI environment. Key words. Sparse nonlinear problems, Inexact Newton method, QuasiNewton, rowprojection method, parallel it...
Computing sparse orthogonal factors in MATLAB
, 1998
"... In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the mu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the multifrontal structure to store this orthogonal factor in a compact way. A new data class with overloaded operators is implemented in Matlab to provide an easy usage of the compact orthogonal factors. This implicit way of storing the orthogonal factor also results in faster computation and application of Q and Q T . Examples are given, where the new version is up to four times faster when computing only R and up to 1000 times faster when computing both Q and R, than the builtin function qr in Matlab. The sqr package is available at URL: http://www.mai.liu.se/~milun/sls/. Key words: QR factorization, sparse problems, multifrontal method, orthogonal factorization. 1 Introduction. Let A 2 IR...
A CoarseGrained Parallel QRFactorization Algorithm for Sparse Least Squares Problems
"... A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned ..."
Abstract
 Add to MetaCart
A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned into block rows, and Givens plane rotations are applied in each blockrow. These are independent tasks and can be done in parallel. Row and column permutations are carried out within the diagonal blocks in an attempt to preserve better the sparsity of the matrix. The algorithm can be used for solving least squares problems either directly or combined with an iterative method (preconditioned conjugate gradients are used). Small nonzero elements can optionally be dropped in the latter case. This leads to a better preservation of the sparsity and, therefore, to a faster factorization. The price which has to be paid is some loss of accuracy. The iterative method is used to regain the...