Results 1 -
5 of
5
Highly scalable parallel algorithms for sparse matrix factorization
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract
-
Cited by 100 (29 self)
- Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge,
A high performance sparse Cholesky factorization algorithm for scalable parallel computers
- Department of Computer Science, University of Minnesota
, 1994
"... Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 6GFlops on a 256-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.
On The LU Factorization Of Sequences Of Identically Structured Sparse Matrices Within A Distributed Memory Environment
, 1994
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii CHAPTERS 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Topic Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Overview : : : : : : : : : : : : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii CHAPTERS 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Topic Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2 BACKGROUND AND RELATED EFFORTS : : : : : : : : : : : : : : 5 2.1 LU Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Algorithm Stability and Error Analysis : : : : : : : : : : : : : : : 6 2.3 Sparse Matrix Concepts : : : : : : : : : : : : : : : : : : : : : : : 11 2.4 Multifrontal Methods : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.5 Factorization Sequences of Matrices : : : : : : : : : : : : : : : : : 24 2.6 Parallel Matrix Computations : : : : : : : : : : : : : : : : : : : : 28 2.7 Multiprocessor Scheduling : : : : : : : : : : : : : : : : : : : : : : 32 3 IMPLEMENTATION PLATFORM : : : : : : : : : : : : : : : : : : : : 37 3.1 Hardware Ar...
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrix-vector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
Pfortran Reference Manual - a Parallel extension of Fortran
, 1999
"... ing the send and receive (or put and get) with an operator results in streamlined code easier to reason about, reduced development time, and portability without degrading performance on message-passing systems [10]. Errors in writing explicit message-passing logic are reduced by passing some ..."
Abstract
- Add to MetaCart
ing the send and receive (or put and get) with an operator results in streamlined code easier to reason about, reduced development time, and portability without degrading performance on message-passing systems [10]. Errors in writing explicit message-passing logic are reduced by passing some of the book keeping and code generation to the translator. System-dependent functionality is limited to a library of routines, facilitating portability of source programs. The Model IPfortran makes the important assumption that each processor knows the names of the variables in all processors. To this end we require that all processors run the same IPfortran code; our programming model is Single-Program Multiple-Datastream, or SPMD. The programmer, using a local view of the data, is responsible for its distribution and access. IPfortran programs are written using the local approach, with explicit logic for each computational element (processor or process) with data decompositions propa...

