Results 1  10
of
11
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 116 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
Multifrontal QR factorization in a multiprocessor environment
, 1994
"... We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix A T A. Uniprocessor issues are first addressed. We then discuss the multiprocessor implementation of the method. Parallelization of both the factorization phase and the solve phase are considered. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant performance improvement. The eight processor Alliant FX/80 is used to illustrate our discussion. 1 ENSEEIHTIRIT (Toulouse, France), amestoy@enseeiht.fr. 2 CERFACS (Toulouse, France) also Rutherford App leton Lab., (England), duff@cerfac...
Sparse Multifrontal Rank Revealing QR Factorization
 SIAM J. Matrix Anal. Appl
, 1995
"... We describe an algorithm to compute a rank revealing sparse QR factorization. We augment a basic sparse multifrontal QR factorization with an incremental condition estimator to provide an estimate of the least singular value and vector for each successive column of R. We remove a column from R as ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We describe an algorithm to compute a rank revealing sparse QR factorization. We augment a basic sparse multifrontal QR factorization with an incremental condition estimator to provide an estimate of the least singular value and vector for each successive column of R. We remove a column from R as soon as the condition estimate exceeds a tolerance, using the approximate singular vector to select a suitable column. Removing columns, or pivoting, requires a dynamic data structure and necessarily degrades sparsity. But most of the additional work fits naturally into the multifrontal factorization's use of efficient dense vector kernels, minimizing overall cost. Further, pivoting as soon as possible reduces the cost of pivot selection and data access. We present a theoretical analysis that shows that our use of approximate singular vectors does not degrade the quality of our rankrevealing factorization; we achieve an exponential bound like methods that use exact singular vectors. We prov...
Incomplete Factorization Preconditioning For Linear Least Squares Problems
, 1994
"... this paper is the modified version of GramSchmidt orthogonalization with a rejection test applied right after the formation of the offdiagonal elements of the factor R. For a given rejection parameter 0 / 1, the rejection test is: if r ij ! /= k a ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
this paper is the modified version of GramSchmidt orthogonalization with a rejection test applied right after the formation of the offdiagonal elements of the factor R. For a given rejection parameter 0 / 1, the rejection test is: if r ij ! /= k a
Multifrontal Computation with the Orthogonal Factors of Sparse Matrices
 SIAM Journal on Matrix Analysis and Applications
, 1994
"... . This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper studies the solution of the linear least squares problem for a large and sparse m by n matrix A with m n by QR factorization of A and transformation of the righthand side vector b to Q T b. A multifrontalbased method for computing Q T b using Householder factorization is presented. A theoretical operation count for the K by K unbordered grid model problem and problems defined on graphs with p nseparators shows that the proposed method requires O(NR ) storage and multiplications to compute Q T b, where NR = O(n log n) is the number of nonzeros of the upper triangular factor R of A. In order to introduce BLAS2 operations, Schreiber and Van Loan's StorageEfficientWY Representation [SIAM J. Sci. Stat. Computing, 10(1989),pp. 5557] is applied for the orthogonal factor Q i of each frontal matrix F i . If this technique is used, the bound on storage increases to O(n(logn) 2 ). Some numerical results for the grid model problems as well as HarwellBoeing problems...
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrixvector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
Parallel Multifrontal Solution Of Sparse Linear Least Squares Problems On DistributedMemory Multiprocessors
 Advanced Computing Research Institute, Center for Theory and Simulation in Science and Engineering, Cornell
, 1994
"... . We describe the issues involved in the design and implementation of efficient parallel algorithms for solving sparse linear least squares problems on distributedmemory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected seminormal equations due ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
. We describe the issues involved in the design and implementation of efficient parallel algorithms for solving sparse linear least squares problems on distributedmemory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected seminormal equations due to Bj¨orck. The major tasks involved are sparse QR factorization, sparse triangular solution and sparse matrixvector multiplication. The sparse QR factorization is accomplished by a parallel multifrontal scheme recently introduced. New parallel algorithms for solving the related sparse triangular systems and for performing sparse matrixvector multiplications are proposed. The arithmetic and communication complexities of our algorithms on regular grid problems are presented. Experimental results on an Intel iPSC/860 machine are described. Key words. parallel algorithms, sparse matrix, orthogonal factorization, multifrontal method, least squares problems, triangular solution, distributedme...
A Stable PrimalDual Approach for Linear Programming
"... This paper studies a primaldual interior/exteriorpoint pathfollowing approach for linearprogramming that is motivated on using an iterative solver rather than a direct solver for the search direction. We begin with the usual perturbed primaldual optimality equations Fu(x, y, z) = 0. Under nonde ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper studies a primaldual interior/exteriorpoint pathfollowing approach for linearprogramming that is motivated on using an iterative solver rather than a direct solver for the search direction. We begin with the usual perturbed primaldual optimality equations Fu(x, y, z) = 0. Under nondegeneracy assumptions, this nonlinear system is wellposed,i.e. it has a nonsingular Jacobian at optimality and is not necessarily illconditioned as the iterates approach optimality. We use a simple preprocessing step to eliminate boththe primal and dual feasibility equations. This results in a single bilinear equation that maintains the wellposedness property. We then apply both a direct solution techniqueas well as a preconditioned conjugate gradient method (PCG), within an inexact Newton framework, directly on the linearized equations. This is done without forming the usualnormal equations, NEQ, or augmented system. Sparsity is maintained. The work of aniteration for the PCG approach consists almost entirely in the (approximate) solution of this wellposed linearized system. Therefore, improvements depend on efficient preconditioning.
Computing sparse orthogonal factors in MATLAB
, 1998
"... In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the mu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this report a new version of the multifrontal sparse QR factorization routine sqr, originally by Matstoms, for general sparse matrices is described and evaluated. In the previous version the orthogonal factor Q is discarded due to storage considerations. The new version provides Q and uses the multifrontal structure to store this orthogonal factor in a compact way. A new data class with overloaded operators is implemented in Matlab to provide an easy usage of the compact orthogonal factors. This implicit way of storing the orthogonal factor also results in faster computation and application of Q and Q T . Examples are given, where the new version is up to four times faster when computing only R and up to 1000 times faster when computing both Q and R, than the builtin function qr in Matlab. The sqr package is available at URL: http://www.mai.liu.se/~milun/sls/. Key words: QR factorization, sparse problems, multifrontal method, orthogonal factorization. 1 Introduction. Let A 2 IR...
A CoarseGrained Parallel QRFactorization Algorithm for Sparse Least Squares Problems
"... A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned ..."
Abstract
 Add to MetaCart
A sparse QRfactorization algorithm SPARQR for coarsegrained parallel computations is described. The coefficient matrix, which is assumed to be general sparse, is reordered in an attempt to bring as many zero elements in the lower left corner as possible. The reordered matrix is then partitioned into block rows, and Givens plane rotations are applied in each blockrow. These are independent tasks and can be done in parallel. Row and column permutations are carried out within the diagonal blocks in an attempt to preserve better the sparsity of the matrix. The algorithm can be used for solving least squares problems either directly or combined with an iterative method (preconditioned conjugate gradients are used). Small nonzero elements can optionally be dropped in the latter case. This leads to a better preservation of the sparsity and, therefore, to a faster factorization. The price which has to be paid is some loss of accuracy. The iterative method is used to regain the...