## Highly scalable parallel algorithms for sparse matrix factorization (1994)

### Cached

### Download Links

- [www.cs.umn.edu]
- [www.cs.umn.edu]
- [www-users.cs.umn.edu]
- [www-users.cs.umn.edu]
- [rsim.cs.illinois.edu]
- [ftp.cs.umn.edu]
- [ftp.cs.umn.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Transactions on Parallel and Distributed Systems |

Citations: | 117 - 29 self |

### BibTeX

@TECHREPORT{Gupta94highlyscalable,

author = {Anshul Gupta and George Karypis and Vipin Kumar},

title = {Highly scalable parallel algorithms for sparse matrix factorization},

institution = {IEEE Transactions on Parallel and Distributed Systems},

year = {1994}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge,

### Citations

794 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...&T. In all of our experiments, we used spectral nested dissection [48] to order the matrices. The factorization algorithm described in this paper will work well with any type of nested dissection. In =-=[21, 22, 20, 31, 30]-=-, we show that nested dissection orderings with proper selection of separators can yield better quality orderings that traditional heuristics, such as, the multiple minimum degree heuristic. 20 The pe... |

500 |
Computer Solution of Large Sparse Positive Definite Systems
- George, Liu
- 1981
(Show Context)
Citation Context ...parallel factorization algorithms [40, 41, 3, 49, 50, 57, 12, 9, 28, 26, 55, 43, 5] have a lower bound of O(Np) on the total communication volume [15]. Since the overall computation is only O(N 1:5 ) =-=[13]-=-, the ratio of communication to computation of column-based schemes is quite high. As a result, these column-cased schemes scale very poorly as the number of processors is increased [55, 53]. In [2], ... |

486 |
Paritioning sparse matrices with eigenvectors of graphs
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ...regular threedimensional grid. NUG15 is from a linear programming problem derived from a quadratic assignment problem obtained from AT&T. In all of our experiments, we used spectral nested dissection =-=[48]-=- to order the matrices. The factorization algorithm described in this paper will work well with any type of nested dissection. In [21, 22, 20, 31, 30], we show that nested dissection orderings with pr... |

483 |
Introduction to Parallel Computing; Design and Analysis of Algorithms. Benjamin/Cummings
- Kumar, Grama, et al.
- 1994
(Show Context)
Citation Context ... sparse linear systems---both in terms of scalability and overall performance. It is well known that dense matrix factorization can be implemented efficiently on distributed-memory parallel computers =-=[8, 46, 10, 34]-=-. We show that the parallel Cholesky factorization algorithm described here is as scalable as the best parallel formulation of dense matrix factorization on both mesh and hypercube architectures for a... |

388 | Tarjan, A Separator Theorem for Planar Graphs
- Lipton, E
- 1979
(Show Context)
Citation Context ... have O( p n)-node and O(n 2=3 )-node separators, respectively. This is because the properties of separators can be generalized from grids to all such graphs within the same order of magnitude bounds =-=[38, 37, 13]-=-. We derive these expressions for both hypercube and mesh architectures, and also extend the results to sparse matrices resulting from three-dimensional graphs whose n-node subgraphs have O(n 2=3 )-no... |

277 | A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience - Barnard, Simon - 1994 |

234 | Users’ guide for the harwell-boeing sparse matrix collection
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...n able to achieve speedups of up to 364 on 1024 processors and 230 on 512 processors over a highly efficient sequential implementation for moderately sized problems from the Harwell-Boeing collection =-=[6]-=-. In [29], we have applied this algorithm to obtain a highly scalable parallel formulation of interior point algorithms and have observed significant speedups in solving linear programming problems. O... |

216 |
The Multifrontal Solution of Indefinite Sparse Symmetric Linear Equations
- Duff, Reid
- 1983
(Show Context)
Citation Context ...Algorithm for Sparse Matrix Factorization The multifrontal algorithm for sparse matrix factorization was proposed independently, and in somewhat different forms, by Speelpening [56] and Duff and Reid =-=[7]-=-, and later elucidated in a tutorial by Liu [39]. In this section, we briefly describe a condensed version of multifrontal sparse Cholesky factorization. Given a sparse matrix and the associated elimi... |

183 |
Generalized nested dissection
- Lipton, Rose, et al.
- 1979
(Show Context)
Citation Context ... have O( p n)-node and O(n 2=3 )-node separators, respectively. This is because the properties of separators can be generalized from grids to all such graphs within the same order of magnitude bounds =-=[38, 37, 13]-=-. We derive these expressions for both hypercube and mesh architectures, and also extend the results to sparse matrices resulting from three-dimensional graphs whose n-node subgraphs have O(n 2=3 )-no... |

115 |
Parallel algorithms for sparse linear systems
- Heath, Ng, et al.
- 1991
(Show Context)
Citation Context ...een mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research =-=[26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]-=-. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems---both in terms o... |

91 | Analysis of multilevel graph partitioning
- Karypis, Kumar
- 1995
(Show Context)
Citation Context ...&T. In all of our experiments, we used spectral nested dissection [48] to order the matrices. The factorization algorithm described in this paper will work well with any type of nested dissection. In =-=[21, 22, 20, 31, 30]-=-, we show that nested dissection orderings with proper selection of separators can yield better quality orderings that traditional heuristics, such as, the multiple minimum degree heuristic. 20 The pe... |

90 | Analyzing Scalability of Parallel Algorithms and Architectures
- Kumar, Gupta
- 1991
(Show Context)
Citation Context ...our parallel algorithm based on multifrontal elimination. In Section 4, we derive expressions for the communication overhead of the parallel algorithm. In Section 5, we use the isoefficiency analysis =-=[34, 36, 17]-=- to determine the scalability of our algorithm and compare it with the scalability of other parallel algorithms for sparse matrix factorization. Section 6 contains experimental results on a Cray T3D p... |

90 |
Efficient parallel solution of linear systems
- Pan, Reif
- 1985
(Show Context)
Citation Context ...t algorithm [12] with column-wise partitioning of an N \Theta N matrix of this type on p processors results in an O(Np log N) total communication volume [15] (box A). The communication volume of 1 In =-=[47]-=-, Pan and Reif describe a parallel sparse matrix factorization algorithm for a PRAM type architecture. This algorithm is not cost-optimal (i.e., the processor-time product exceeds the serial complexit... |

89 |
The multifrontal method for sparse matrix solution: Theory and practice
- Liu
- 1992
(Show Context)
Citation Context ...ultifrontal algorithm for sparse matrix factorization was proposed independently, and in somewhat different forms, by Speelpening [56] and Duff and Reid [7], and later elucidated in a tutorial by Liu =-=[39]-=-. In this section, we briefly describe a condensed version of multifrontal sparse Cholesky factorization. Given a sparse matrix and the associated elimination tree, the multifrontal algorithm can be r... |

78 | The multifrontal solution of unsymmetric sets of linear systems - Duff, Reid - 1984 |

76 |
Parallel algorithms for dense linear algebra computations
- Gallivan, Plemmons, et al.
- 1990
(Show Context)
Citation Context ... sparse linear systems---both in terms of scalability and overall performance. It is well known that dense matrix factorization can be implemented efficiently on distributed-memory parallel computers =-=[8, 46, 10, 34]-=-. We show that the parallel Cholesky factorization algorithm described here is as scalable as the best parallel formulation of dense matrix factorization on both mesh and hypercube architectures for a... |

73 | A parallel algorithm for multilevel graph partitioning and sparse matrix ordering
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... other three phases need to be parallelized as well. We have developed parallel algorithms for the other phases that are tailored to work in conjunction with the numerical factorization algorithm. In =-=[32]-=-, we describe an efficient parallel algorithm for determining fill-reducing orderings for parallel factorization of sparse matrices. This algorithm, while performing the ordering in parallel, also dis... |

71 |
Isoefficiency: measuring the scalability of parallel algorithms and architectures
- Grama, Gupta, et al.
(Show Context)
Citation Context ...our parallel algorithm based on multifrontal elimination. In Section 4, we derive expressions for the communication overhead of the parallel algorithm. In Section 5, we use the isoefficiency analysis =-=[34, 36, 17]-=- to determine the scalability of our algorithm and compare it with the scalability of other parallel algorithms for sparse matrix factorization. Section 6 contains experimental results on a Cray T3D p... |

57 |
Sparse Cholesky factorization on a local memory multiprocessor
- George, Heath, et al.
- 1988
(Show Context)
Citation Context ...een mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research =-=[26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]-=-. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems---both in terms o... |

55 |
An e#cient block-oriented approach to parallel sparse Cholesky factorization,Supercomputing '93
- Rothberg, Gupta
- 1993
(Show Context)
Citation Context ...een mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research =-=[26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]-=-. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems---both in terms o... |

55 | Fast and Effective Algorithms for Graph Partitioning and Sparse-matrix
- Gupta
- 1997
(Show Context)
Citation Context ...&T. In all of our experiments, we used spectral nested dissection [48] to order the matrices. The factorization algorithm described in this paper will work well with any type of nested dissection. In =-=[21, 22, 20, 31, 30]-=-, we show that nested dissection orderings with proper selection of separators can yield better quality orderings that traditional heuristics, such as, the multiple minimum degree heuristic. 20 The pe... |

49 |
The iPSC/2 Direct-Connect Communication Technology
- Nugent
- 1988
(Show Context)
Citation Context ... is shown in [35] that although this communication does not take place between the nearest neighbors on a subcube, the paths of all communications on any subcube are conflict free with e-cube routing =-=[45, 34]-=- and cut-through or worm-hole flow control. This is a direct consequence of the fact that a circular shift is conflict free on a hypercube with e-cube routing. Thus, a communication pipeline can be ma... |

48 |
Task scheduling for parallel sparse Cholesky factorization
- Geist, Ng
- 1989
(Show Context)
Citation Context |

42 | Highly parallel sparse Cholesky factorization - Gilbert, Schreiber - 1992 |

40 | Massively parallel methods for engineering and science problems - Camp, Plimpton, et al. - 1994 |

38 |
Performance of panel and block approaches to sparse Cholesky factorization on the iPSC=860 and Paragon multiprocessors
- Rothberg
- 1993
(Show Context)
Citation Context ...ze more than O( p N) processors for matrices arising from two-dimensional constant node-degree graphs. Recently, a number of schemes with two-dimensional partitioning of the matrix have been proposed =-=[18, 53, 52, 18, 54, 1, 44, 58, 16, 33, 42, 4, 59]-=-. The least total communication volume in most of these schemes is O(N p p log p) (box C) 2 . Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communica... |

38 | Improved load distribution in parallel sparse Cholesky factorization, Supercomputing '94
- Rothberg, Schreiber
- 1994
(Show Context)
Citation Context |

38 | A fan-in algorithm for distributed sparse numerical factorization - Ashcraft, Eisenstat, et al. - 1990 |

35 |
Towards a fast implementation of spectral nested dissection
- Pothen, Simon, et al.
- 1992
(Show Context)
Citation Context ...from the Harwell-Boeing collection of sparse matrices [6]. These results show that our algorithm can deliver good speedups on hundreds of processors for practical problems. Spectral nested dissection =-=[50, 51, 52]-=- was used to order these matrices. The algorithm presented in Section 3 relies on the ordering algorithm to yield a balanced elimination tree. Imbalances in the elimination tree result in a loss in th... |

35 | Sparse QR Factorization in MATLAB - Matstoms - 1994 |

33 |
Communication results for parallel sparse Cholesky factorization on a hypercube
- George, Liu, et al.
- 1989
(Show Context)
Citation Context |

31 |
Scalability of sparse direct solvers
- Schreiber
- 1992
(Show Context)
Citation Context |

30 | Spectral nested dissection
- Pothen, Simon, et al.
- 1992
(Show Context)
Citation Context ...from the Harwell-Boeing collection of sparse matrices [6]. These results show that our algorithm can deliver good speedups on hundreds of processors for practical problems. Spectral nested dissection =-=[50, 51, 52]-=- was used to order these matrices. The algorithm presented in Section 3 relies on the ordering algorithm to yield a balanced elimination tree. Imbalances in the elimination tree result in a loss in th... |

26 |
Nested dissection of a regular finite-element mesh
- George
(Show Context)
Citation Context ...e Cholesky factorization has been used in [15] in the context of a column-based subtree-to-subcube scheme. Within 15 very small constant factors, the analysis holds for the standard nested dissection =-=[11]-=- of grid graphs. We consider a cross-shaped separator (described in [15]) consisting of 2 p N \Gamma1 nodes that partitions the N-node square grid into four square subgrids of size ( p N \Gamma 1)=2 \... |

23 | Performance properties of large scale parallel systems
- Gupta, Kumar
- 1993
(Show Context)
Citation Context ...t is the total communication overhead that actually determines the overall efficiency and speedup, and is defined as the difference between the parallel processor-time product and the serial run time =-=[23, 34]-=-. The communication overhead can be asymptotically higher than the communication volume. For example, a one-to-all broadcast algorithm based on a binary tree communication pattern has a total communic... |

22 |
A parallel solution method for large sparse systems of equations
- Blank, Lucas, et al.
- 1989
(Show Context)
Citation Context |

19 |
The fan-both family of column-based distributed Cholesky factorisation algorithms
- Ashcraft
- 1993
(Show Context)
Citation Context |

19 |
A comparison of three column based distributed sparse factorization schemes
- Ashcraft, Eisenstat, et al.
- 1991
(Show Context)
Citation Context |

19 |
Distributed multifrontal factorization using clique trees
- Pothen, Sun
- 1992
(Show Context)
Citation Context |

17 | Efficient parallel solutions of large sparse SPD systems on distributed-memory multiprocessors
- SUN
- 1992
(Show Context)
Citation Context |

16 | Communication reduction in parallel sparse Cholesky factorization on a hypercube - George, Liu, et al. - 1987 |

16 | A parallel formulation of interior point algorithms
- Karypis, Gupta, et al.
- 1994
(Show Context)
Citation Context ...o achieve speedups of up to 364 on 1024 processors and 230 on 512 processors over a highly efficient sequential implementation for moderately sized problems from the Harwell-Boeing collection [6]. In =-=[29]-=-, we have applied this algorithm to obtain a highly scalable parallel formulation of interior point algorithms and have observed significant speedups in solving linear programming problems. On the Cra... |

14 |
Limiting Communication in Parallel Sparse Cholesky Factorization
- Hulbert, Zmijewski
- 1991
(Show Context)
Citation Context |

13 |
Load Balancing in Parallel Sparse Matrix Computations
- Manne
- 1993
(Show Context)
Citation Context |

13 | The multifrontal solution of sparse linear least squares problems - Matstoms - 1991 |

12 |
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance
- VENUGOPAL, NAIK
- 1991
(Show Context)
Citation Context |

12 | SHAPE: A Parallelization Tool for Sparse Matrix Computations - Venugopal, Naik - 1992 |

11 |
factorization algorithms on distributed-memory multiprocessor architectures
- LU
- 1988
(Show Context)
Citation Context ... sparse linear systems---both in terms of scalability and overall performance. It is well known that dense matrix factorization can be implemented efficiently on distributed-memory parallel computers =-=[8, 46, 10, 34]-=-. We show that the parallel Cholesky factorization algorithm described here is as scalable as the best parallel formulation of dense matrix factorization on both mesh and hypercube architectures for a... |

11 |
Distributed sparse matrix factorization: QR and Cholesky decompositions
- Raghavan
- 1991
(Show Context)
Citation Context |

10 |
The domain/segment partition for the factorization of sparse symmetric positive definite matrices
- Ashcraft
- 1990
(Show Context)
Citation Context |