Results 1  10
of
10
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 117 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systemsâ€”both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
Fast and Effective Algorithms for Graph Partitioning and Sparse Matrix Ordering
 IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1996
"... Graph partitioning is a fundamental problem in several scientific and engineering applications. In this paper, we describe heuristics that improve the stateoftheart practical algorithms used in graphpartitioning software in terms of both partitioning speed and quality. An important use of graph ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
Graph partitioning is a fundamental problem in several scientific and engineering applications. In this paper, we describe heuristics that improve the stateoftheart practical algorithms used in graphpartitioning software in terms of both partitioning speed and quality. An important use of graphpartitioning is in ordering sparse matrices for obtaining direct solutions to sparse systems of linear equations arising in engineering and optimization applications. The experiments reported in this paper show that the use of these heuristics results in a considerable improvement in the quality of sparsematrix orderings over conventional ordering methods, especially for sparse matrices arising in linear programming problems. In addition, our graphpartitioningbased ordering algorithm is more parallelizable than minimumdegreebased ordering algorithms, and it renders the ordered matrix more amenable to parallel factorization.
Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems
, 1995
"... A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its imple ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementation on up to 256 processors of the Cray T3D parallel computer. It has been a common belief that parallel sparse triangular solvers are quite unscalable due to a high communication to computation ratio. Our analysis and experiments show that, although not as scalable as the best parallel sparse Cholesky factorization algorithms, parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems, the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver.
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrixvector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
Parallel Algorithms for Forward Elimination and Backward Substitution in Direct Solution of Sparse Linear Systems
 in Direct Solution of Sparse Linear Systems, Proc. Supercomputing'95
, 1995
"... A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementa ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementation on up to 256 processors of the Cray T3D parallel computer. It has been a common belief that parallel sparse triangular solvers are quite unscalable due to a high communication to computation ratio. Our analysis and experiments show that, although not as scalable as the best parallel sparse Cholesky factorization algorithms, parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems, the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver. This work is sponsored in part by the Army High Performance Computing Research Center...
A New Approach to Parallel Sparse Cholesky Factorization on Distributed Memory Parallel Computers
, 1993
"... Nowadays, programming distributed memory parallel computers (DMPCs) evokes the "no pain, no gain" idea. That is, for a given problem to be solved in parallel, the message passing programming model involves distributing the data and the computations among the processors. While this can be easily feas ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Nowadays, programming distributed memory parallel computers (DMPCs) evokes the "no pain, no gain" idea. That is, for a given problem to be solved in parallel, the message passing programming model involves distributing the data and the computations among the processors. While this can be easily feasible for well structured problems, it can become fairly hard on unstructured ones, like sparse matrix computations. In this paper, we consider a relatively new approach to implementing the Cholesky factorization on a DMPC running a shared virtual memory (SVM). The abstraction of a shared memory on top of a distributed memory allows us to introduce a largegrain factorization algorithm, synchronized with events. Several scheduling strategies are compared, and experiments conducted so far show that this approach can provide the power of DMPCs and the ease of programming with shared variables.
A Reordering and Mapping Algorithm for Parallel Sparse Cholesky Factorization
 in Proc. Scalable High Performance Computing Conference
, 1994
"... A judiciously chosen symmetric permutation can significantly reduce the amount of storage and computation for the Cholesky factorization of sparse matrices. On distributed memory machines, the issue of mapping data and computation on processors is also important. Previous research on ordering for pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A judiciously chosen symmetric permutation can significantly reduce the amount of storage and computation for the Cholesky factorization of sparse matrices. On distributed memory machines, the issue of mapping data and computation on processors is also important. Previous research on ordering for parallelism has focussed on idealized measures like execution time on an unbounded number of processors, with zero communication costs. In this paper, we propose an ordering and mapping algorithm that attempts to minimize communication and performs loadbalancing of work among the processors. Performance results on an Intel iPSC/860 hypercube are presented to demonstrate its effectiveness. 1 Introduction We consider the solution of linear systems of the form Ax = b where A is large, sparse, symmetric and positive definite. With such systems, Cholesky decomposition can be applied to factorize A as A = LL T , where L is lower triangular, followed by forward and backward triangular solves. Du...
Scheduling Strategies For Sparse Cholesky Factorization On A Shared Virtual Memory Parallel Computer
 In International Conference on Parallel Processing
, 1994
"... To solve a given problem on a distributed memory parallel computer (DMPC), the message passing programming model involves distributing both the data and the computations among the processors. While this can be easily feasible for well structured problems, it can become fairly hard for unstructured o ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
To solve a given problem on a distributed memory parallel computer (DMPC), the message passing programming model involves distributing both the data and the computations among the processors. While this can be easily feasible for well structured problems, it can become fairly hard for unstructured ones, like sparse matrix computations, unless you use some runtime support. In this paper, we consider a relatively new approach to implementing the Cholesky factorization on a DMPC, by using a shared virtual memory (SVM). The abstraction of a shared memory on top of a distributed memory allows us to introduce a largegrain factorization algorithm, synchronized with events. Experiments conducted so far show that some scheduling techniques enhance not only the parallelism but the SVM behavior as well, allowing interesting results. 1 INTRODUCTION In many computational kernels, the solution to a sparse linear system is required. This system may arise in various problems, such as the discretizat...
A Clustering Algorithm For Parallel Sparse Cholesky Factorization
"... This paper presents an integrated approach to two issues relevant to efficient parallel sparse Cholesky factorization: 1) matrix reordering for parallelism, and, 2) mapping of data to processors. A clustering heuristic is proposed to performs a fillpreserving reordering and mapping of data onto ..."
Abstract
 Add to MetaCart
This paper presents an integrated approach to two issues relevant to efficient parallel sparse Cholesky factorization: 1) matrix reordering for parallelism, and, 2) mapping of data to processors. A clustering heuristic is proposed to performs a fillpreserving reordering and mapping of data onto a fixed number of processors. Performance results on a Cray T3D are presented to demonstrate its effectiveness. Keywords: Sparse matrices; Cholesky factorization; Reordering; Distributed memory multiprocessors; Mapping heuristic. 1. Introduction We consider the solution of linear systems of the form Ax = b where A is large, sparse, symmetric and positive definite. Since the matrix is positive definite, pivoting is not required for numerical stability. With such systems, Cholesky decomposition can be applied to factorize A into A = LL T , where L is lower triangular, followed by forward and backward triangular solves. During factorization, zero entries of A might become nonzero. This ...
Molecular Structure Computation from Multiple Data Sources
, 2000
"... Elucidating the threedimensional structure of biological molecules such as nucleic acids, proteins, and their macromolecular assemblies is fundamental to understanding their functions. It also poses a computational challenge because of the large number of parameters and the nonlinear relationships ..."
Abstract
 Add to MetaCart
Elucidating the threedimensional structure of biological molecules such as nucleic acids, proteins, and their macromolecular assemblies is fundamental to understanding their functions. It also poses a computational challenge because of the large number of parameters and the nonlinear relationships between them. This dissertation builds on a probabilistic least squares approach adapted for molecular structure estimation, using multiple sources of uncertain data.