Results 1 - 10
of
13
A supernodal approach to sparse partial pivoting
- SIAM Journal on Matrix Analysis and Applications
, 1999
"... We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. To perform most of the numerical computation in dense matrix kernels, we introduce the notion of unsymmetric supernodes. To better exploit the memory ..."
Abstract
-
Cited by 158 (20 self)
- Add to MetaCart
We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. To perform most of the numerical computation in dense matrix kernels, we introduce the notion of unsymmetric supernodes. To better exploit the memory hierarchy, weintroduce unsymmetric supernode-panel updates and two-dimensional data partitioning. To speed up symbolic factorization, we use Gilbert and Peierls's depth- rst search with Eisenstat and Liu's symmetric structural reductions. We have implemented a sparse LU code using all these ideas. We present experiments demonstrating that it is signi cantly faster than earlier partial pivoting codes. We also compare performance with Umfpack, which uses a multifrontal approach; our code is usually faster.
Implementation of Interior Point Methods for Large Scale Linear Programming
- in Interior Point Methods in Mathematical Programming
, 1996
"... In the past 10 years the interior point methods (IPM) for linear programming have gained extraordinary interest as an alternative to the sparse simplex based methods. This has initiated a fruitful competition between the two types of algorithms which has lead to very efficient implementations on bot ..."
Abstract
-
Cited by 56 (18 self)
- Add to MetaCart
In the past 10 years the interior point methods (IPM) for linear programming have gained extraordinary interest as an alternative to the sparse simplex based methods. This has initiated a fruitful competition between the two types of algorithms which has lead to very efficient implementations on both sides. The significant difference between interior point and simplex based methods is reflected not only in the theoretical background but also in the practical implementation. In this paper we give an overview of the most important characteristics of advanced implementations of interior point methods. First, we present the infeasible-primal-dual algorithm which is widely considered the most efficient general purpose IPM. Our discussion includes various algorithmic enhancements of the basic algorithm. The only shortcoming of the "traditional" infeasible-primal-dual algorithm is to detect a possible primal or dual infeasibility of the linear program. We discuss how this problem can be solve...
Coarse-Grain Parallel Programming in Jade
, 1991
"... This paper presents Jade, a language which allows a programmer to easily express dynamic coarse-grain parallelism. Starting with a sequential program, a programmer augments those sections of code to be parallelized with abstract data usage information. The compiler and run-time system use this inf ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
This paper presents Jade, a language which allows a programmer to easily express dynamic coarse-grain parallelism. Starting with a sequential program, a programmer augments those sections of code to be parallelized with abstract data usage information. The compiler and run-time system use this information to concurrently execute the program while respecting the program's data dependence constraints. Using Jade can significantly reduce the time and effort required to develop and maintain a parallel version of an imperative application with serial semantics. The paper introduces the basic principles of the language, compares Jade with other existing languages, and presents the performance of a sparse matrix Cholesky factorization algorithm implemented in Jade.
Sparse Gaussian Elimination on High Performance Computers
, 1996
"... This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performan ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performance machines. In the first part we discuss optimizations of a sequential algorithm to exploit the memory hierarchies that exist in most RISC-based superscalar computers. We begin with the left-looking supernode-column algorithm by Eisenstat, Gilbert and Liu, which includes Eisenstat and Liu's symmetric structural reduction for fast symbolic factorization. Our key contribution is to develop both numeric and symbolic schemes to perform supernodepanel updates to achieve better data reuse in cache and floating-point register...
MIP: Theory And Practice - Closing The Gap
- System Modelling and Optimization: Methods, Theory, and Applications
, 2000
"... this paper, now include cutting-plane capabilities as well as other ideas from the backlog of accumulated theory. As suggested by the title of this paper, the gap between theory and practice is indeed closing ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
this paper, now include cutting-plane capabilities as well as other ideas from the backlog of accumulated theory. As suggested by the title of this paper, the gap between theory and practice is indeed closing
Dynamic supernodes in sparse Cholesky update/downdate and triangular solves
- ACM Trans. Math. Software
, 2006
"... The supernodal method for sparse Cholesky factorization represents the factor L as a set of supernodes, each consisting of a contiguous set of columns of L with identical nonzero pattern. A conventional supernode is stored as a dense submatrix. While this is suitable for sparse Cholesky factorizatio ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
The supernodal method for sparse Cholesky factorization represents the factor L as a set of supernodes, each consisting of a contiguous set of columns of L with identical nonzero pattern. A conventional supernode is stored as a dense submatrix. While this is suitable for sparse Cholesky factorization where the nonzero pattern of L does not change, it is not suitable for methods that modify a sparse Cholesky factorization after a low-rank change to A (an update/downdate, A = A±WW T). Supernodes merge and split apart during an update/downdate. Dynamic supernodes are introduced, which allow a sparse Cholesky update/downdate to obtain performance competitive with conventional supernodal methods. A dynamic supernodal solver is shown to exceed the performance of the conventional (BLAS-based) supernodal method for solving triangular systems. These methods are incorporated into CHOLMOD, a sparse Cholesky factorization and update/downdate package, which forms the basis of x=A\b in MAT-LAB when A is sparse and symmetric positive definite. 1
Algorithm 8xx: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate
- ACM Trans. Math. Software
, 2006
"... CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for bo ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x=A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions. 1
Algorithm 8xx: a concise sparse Cholesky factorization package
- Univ. of Florida
, 2004
"... The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possib ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant new method of sparse symmetric factorization that computes the factorization row-by-row but stores it column-by-column. The entire symbolic and numeric factorization consists of a total of only 53 lines of code. The package is written in C, and includes a MATLAB interface.
A Parallel Interior-Point Algorithm for Linear Programming on a Shared Memory Machine
, 1998
"... The XPRESS 1 interior point optimizer is an "industrial strength" code for solution of large-scale sparse linear programs. The purpose of the present paper is to discuss how the XPRESS interior point optimizer has been parallelized for a Silicon Graphics multi processor computer. The major comp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The XPRESS 1 interior point optimizer is an "industrial strength" code for solution of large-scale sparse linear programs. The purpose of the present paper is to discuss how the XPRESS interior point optimizer has been parallelized for a Silicon Graphics multi processor computer. The major computational task, performed in each iteration of the interior-point method implemented in the XPRESS interior point optimizer is the solution of a symmetric and positive definite system of linear equations. Therefore, parallelization of the Cholesky decomposition and the triangular solve procedure are discussed in detail. Finally, computational results are presented to demonstrate the parallel efficiency of the optimizer. It should be emphasized that the methods discussed can be applied to the solution of large-scale sparse linear least squares problems. Acknowledgment: We appreciate the comments made by an anonymous referee appointed by CORE which helped us to improve the manuscript....
Locality of reference in sparse Cholesky factorization methods. Submitted to the Electronic Transactions on Numerical Analysis
, 2004
"... Abstract. This paper analyzes the cache efficiency of two high-performance sparse Cholesky factorization algorithms: the multifrontal algorithm and the left-looking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. This paper analyzes the cache efficiency of two high-performance sparse Cholesky factorization algorithms: the multifrontal algorithm and the left-looking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in general-symmetric and general-unsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cache-miss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finite-element structural analysis, the left-looking algorithm consistently outperforms the multifrontal algorithm. Direct cache-miss measurements indicate that the difference in performance is largely due to differences in the number of level-2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the left-looking algorithm. On the other hand, the left-looking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cache-efficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In

