Results 1  10
of
272
A column approximate minimum degree ordering algorithm
, 2000
"... Sparse Gaussian elimination with partial pivoting computes the factorization PAQ = LU of a sparse matrix A, where the row ordering P is selected during factorization using standard partial pivoting with row interchanges. The goal is to select a column preordering, Q, based solely on the nonzero patt ..."
Abstract

Cited by 319 (54 self)
 Add to MetaCart
Sparse Gaussian elimination with partial pivoting computes the factorization PAQ = LU of a sparse matrix A, where the row ordering P is selected during factorization using standard partial pivoting with row interchanges. The goal is to select a column preordering, Q, based solely on the nonzero pattern of A such that the factorization remains as sparse as possible, regardless of the subsequent choice of P. The choice of Q can have a dramatic impact on the number of nonzeros in L and U. One scheme for determining a good column ordering for A is to compute a symmetric ordering that reduces fillin in the Cholesky factorization of ATA. This approach, which requires the sparsity structure of ATA to be computed, can be expensive both in
The MIT Alewife Machine: Architecture and Performance
 In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and costeffective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable a ..."
Abstract

Cited by 193 (22 self)
 Add to MetaCart
(Show Context)
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and costeffective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable and programmable. Four mechanisms combine to achieve these goals: softwareextended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for finegrain computation allows many processorsto cooperate on small problem sizes; and latency tolerance mechanisms  including block multithreading and prefetching  mask unavoidable delays due to communication; Microbenchmarks, together with over a dozen complete applications running on the 32node prototype, help analyze the behavior of the system. Analysis shows that integrating message passing with sha...
A sparse approximate inverse preconditioner for nonsymmetric linear systems
 SIAM J. SCI. COMPUT
, 1998
"... This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner f ..."
Abstract

Cited by 192 (22 self)
 Add to MetaCart
This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner for conjugate gradient–type methods. Some theoretical properties of the preconditioner are discussed, and numerical experiments on test matrices from the Harwell–Boeing collection and from Tim Davis’s collection are presented. Our results indicate that the new preconditioner is cheaper to construct than other approximate inverse preconditioners. Furthermore, the new technique insures convergence rates of the preconditioned iteration which are comparable with those obtained with standard implicit preconditioners.
Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers
, 1998
"... We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been dev ..."
Abstract

Cited by 187 (30 self)
 Add to MetaCart
(Show Context)
We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed. We discuss some of the main algorithmic choices and compare both implementation issues and the performance of the LDL T and LU factorizations. Performance analysis on an IBM SP2 shows the efficiency and the potential of the method. The test problems used are from the RutherfordBoeing collection and from the PARASOL end users.
An UnsymmetricPattern Multifrontal Method for Sparse LU Factorization
 SIAM J. MATRIX ANAL. APPL
, 1994
"... Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops ..."
Abstract

Cited by 150 (27 self)
 Add to MetaCart
Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops by using dense matrix kernels. However, no efficient LU factorization algorithm based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. We address this deficiency and present a new unsymmetricpattern multifrontal method based on dense matrix kernels. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by factorizing more than one pivot in each frontal matrix thus enabling the use of Level 2 and Level 3 BLAS. The performance is compared with the classical multifrontal method and other unsymmetric solvers on a CRAY YMP.
SuperLU DIST: A scalable distributedmemory sparse direct solver for unsymmetric linear systems
 ACM Trans. Mathematical Software
, 2003
"... We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and sc ..."
Abstract

Cited by 144 (18 self)
 Add to MetaCart
(Show Context)
We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on largescale distributed machines.
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 133 (28 self)
 Add to MetaCart
(Show Context)
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
 SIAM J. MATRIX ANAL. APPL
, 1999
"... ..."
CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 2001
"... The initial release of CUTE, a widely used testing environment for optimization software was described in [2]. The latest version, now known as CUTEr is presented. New features include reorganisation of the environment to allow simultaneous multiplatform installation, new tools for, and interface ..."
Abstract

Cited by 86 (8 self)
 Add to MetaCart
(Show Context)
The initial release of CUTE, a widely used testing environment for optimization software was described in [2]. The latest version, now known as CUTEr is presented. New features include reorganisation of the environment to allow simultaneous multiplatform installation, new tools for, and interfaces to, optimization packages, and a considerably simplified and entirely automated installation procedure for unix systems. The SIF decoder, which used to be a part of CUTE, has become a separate tool, easily callable by various packages. It features simple extensions to the SIF test problem format and the generation of files suited to automatic differentiation packages.
Scientific Computing on Bulk Synchronous Parallel Architectures
"... We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures. ..."
Abstract

Cited by 75 (16 self)
 Add to MetaCart
(Show Context)
We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures.