## Efficient Parallel Solutions Of Large Sparse SPD Systems On Distributed-Memory Multiprocessors (0)

Venue: | Advanced Computing Research Institute, Center for Theory and Simulation in Science and Engineering, Cornell |

Citations: | 17 - 2 self |

### BibTeX

@TECHREPORT{Sun_efficientparallel,

author = {Chunguang Sun},

title = {Efficient Parallel Solutions Of Large Sparse SPD Systems On Distributed-Memory Multiprocessors},

institution = {Advanced Computing Research Institute, Center for Theory and Simulation in Science and Engineering, Cornell},

year = {}

}

### OpenURL

### Abstract

. We consider several issues involved in the solution of sparse symmetric positive definite systems by multifrontal method on distributed-memory multiprocessors. First, we present a new algorithm for computing the partial factorization of a frontal matrix on a subset of processors which significantly improves the performance of a distributed multifrontal algorithm previously designed. Second, new parallel algorithms for computing sparse forward elimination and sparse backward substitution are described. The new algorithms solve the sparse triangular systems in a multifrontal fashion. Numerical experiments run on an Intel iPSC/860 and an Intel iPSC/2 for a set of problems with regular and irregular sparsity structure are reported. More than 180 million flops per second during the numerical factorization are achieved for a three-dimensional grid problem on an iPSC/860 machine with 32 processors. Key words. Cholesky factorization, clique tree, distributed-memory multiprocessors, multifro...

### Citations

525 |
Computer Solution of Large Sparse Positive Definite Systems
- George, Liu
- 1981
(Show Context)
Citation Context ...y, a direct solution process involves four steps: (i) computation of a fill-reducing ordering, (ii) symbolic factorization, (iii) numerical factorization, and (iv) triangular solution. George and Liu =-=[13]-=- provide a comprehensive discussion on these four steps and their implementations on serial machines. We focus our attention on the design of efficient parallel algorithms for the numerical factorizat... |

289 |
Sparse matrix test problems
- Duff, Grimes, et al.
- 1989
(Show Context)
Citation Context ...erical modelling of some biomechanical systems [4]. NASA4704 is from the NASA Ames collection of sparse matrices. The remaining test problems are from the Harwell-Boeing collection of sparse matrices =-=[7]-=-. All grid problems are ordered by the nested-dissection ordering [11]. CN28CN is ordered by the automatic nested-dissection ordering subroutine from SPARSPAK [5]. All other problems are ordered by mi... |

230 |
The multifrontal solution of indefinite sparse symmetric linear systems
- Duff, Reid
- 1983
(Show Context)
Citation Context ...hms. There are roughly three classes of algorithms for computing sparse Cholesky factorization on distributed-memory machines---fan-out algorithm [12], fan-in algorithm [2] and multifrontal algorithm =-=[8]-=-. Ashcraft, Eisenstat, Liu and Sherman [3] have reported that the multifrontal method is more efficient than both fan-out and fan-in algorithms on an iPSC/2 machine for the 63 \Theta 63 nine-point reg... |

183 |
Nested dissection of a regular finite element mesh
- George
- 1973
(Show Context)
Citation Context ...the NASA Ames collection of sparse matrices. The remaining test problems are from the Harwell-Boeing collection of sparse matrices [7]. All grid problems are ordered by the nested-dissection ordering =-=[11]-=-. CN28CN is ordered by the automatic nested-dissection ordering subroutine from SPARSPAK [5]. All other problems are ordered by minimum degree ordering [14]. The characteristics of the test problems a... |

122 |
Parallel algorithms for sparse linear systems
- Heath, Ng, et al.
- 1991
(Show Context)
Citation Context ...lly. Significant effort has been invested on designing efficient parallel algorithms for solving sparse symmetric positive definite systems on distributed-memory multiprocessors. Heath, Ng and Peyton =-=[17]-=- have recently published a survey on parallel sparse matrix factorization algorithms. There are roughly three classes of algorithms for computing sparse Cholesky factorization on distributed-memory ma... |

48 |
Task Scheduling for Parallel Sparse Cholesky Factorization
- Geist, Ng
- 1990
(Show Context)
Citation Context ...n method described in [25] is briefly reviewed here. The cliques on a clique tree are mapped by a proportional mapping scheme [25] which is shown to be more efficient than the bin-pack mapping scheme =-=[10]-=- or the subtree-to-subcube mapping scheme [15]. A subtree T with root clique K is allocated a set S of consecutively numbered processors. The clique K is partitioned among the processors in S---i.e., ... |

45 |
A fast algorithm for reordering sparse matrices for parallel factorization
- Lewis, Peyton, et al.
- 1989
(Show Context)
Citation Context ...the adjacency graph of a Cholesky factor L can be organized into a tree structure called clique tree. A discussion of clique trees and their applications in sparse matrix computations may be found in =-=[18, 22, 24]-=-. A maximal clique is referred to as a clique for brevity. The adjacency graph of a 6 \Theta 6 Cholesky factor and a corresponding clique tree are shown in Fig. 1, where each clique is shown as a circ... |

38 |
A fan-in algorithm for distributed sparse numerical factorization
- Ashcraft, Eisenstat, et al.
- 1990
(Show Context)
Citation Context ...se matrix factorization algorithms. There are roughly three classes of algorithms for computing sparse Cholesky factorization on distributed-memory machines---fan-out algorithm [12], fan-in algorithm =-=[2]-=- and multifrontal algorithm [8]. Ashcraft, Eisenstat, Liu and Sherman [3] have reported that the multifrontal method is more efficient than both fan-out and fan-in algorithms on an iPSC/2 machine for ... |

34 |
Communication results for parallel sparse Cholesky factorization on a hypercube
- George, Liu, et al.
- 1989
(Show Context)
Citation Context ... here. The cliques on a clique tree are mapped by a proportional mapping scheme [25] which is shown to be more efficient than the bin-pack mapping scheme [10] or the subtree-to-subcube mapping scheme =-=[15]-=-. A subtree T with root clique K is allocated a set S of consecutively numbered processors. The clique K is partitioned among the processors in S---i.e., the frontal matrix FK of clique K is wrapped a... |

23 |
Solution of sparse positive definite systems on a Hypercube
- George, Heath, et al.
- 1989
(Show Context)
Citation Context ...survey on parallel sparse matrix factorization algorithms. There are roughly three classes of algorithms for computing sparse Cholesky factorization on distributed-memory machines---fan-out algorithm =-=[12]-=-, fan-in algorithm [2] and multifrontal algorithm [8]. Ashcraft, Eisenstat, Liu and Sherman [3] have reported that the multifrontal method is more efficient than both fan-out and fan-in algorithms on ... |

22 |
Solving sparse triangular linear systems on parallel computers
- Anderson, Saad
- 1989
(Show Context)
Citation Context ...Zmijewski [27] considered the use of cyclic algorithms employed in the dense triangular solution algorithms for solving sparse triangular systems. Other parallel sparse triangular solution algorithms =-=[1, 16, 21, 26]-=- have been proposed in the context of preconditioned conjugate gradient method. We present a parallel multifrontal triangular solution algorithm for computing the sparse forward elimination and sparse... |

14 |
A comparison of three columnbased distributed sparse factorization schemes
- Ashcraft, Eisenstat, et al.
- 1990
(Show Context)
Citation Context ...gorithms for computing sparse Cholesky factorization on distributed-memory machines---fan-out algorithm [12], fan-in algorithm [2] and multifrontal algorithm [8]. Ashcraft, Eisenstat, Liu and Sherman =-=[3]-=- have reported that the multifrontal method is more efficient than both fan-out and fan-in algorithms on an iPSC/2 machine for the 63 \Theta 63 nine-point regular grid. Pothen and Sun [25] have design... |

14 |
Performance of the Intel iPSC/860 Hypercube
- Dunigan
- 1990
(Show Context)
Citation Context ...arithmetic work, the better the performance on both iPSC/860 and iPSC/2. When more arithmetic work is involved, the impact of the communication overhead on the overall computation is reduced. Dunigan =-=[9]-=- has studied the performance of Intel iPSC/2 and Intel iPSC/860. The ratio of communication speed to computation speed is calculated using the 8-byte transfer and multiply times, where the 8-byte tran... |

13 |
Sparse Cholesky Factorization on a Multiprocessor
- ZMIJEWSKI
- 1987
(Show Context)
Citation Context ...parse triangular solution in the context of using a fan-out algorithm to compute sparse Cholesky factorization. The columns of L are mapped to processors by using subtree-tosubcube mapping. Zmijewski =-=[27]-=- considered the use of cyclic algorithms employed in the dense triangular solution algorithms for solving sparse triangular systems. Other parallel sparse triangular solution algorithms [1, 16, 21, 26... |

12 |
Aggregation methods for solving sparse triangular systems on multiprocessors
- Saltz
- 1990
(Show Context)
Citation Context ...Zmijewski [27] considered the use of cyclic algorithms employed in the dense triangular solution algorithms for solving sparse triangular systems. Other parallel sparse triangular solution algorithms =-=[1, 16, 21, 26]-=- have been proposed in the context of preconditioned conjugate gradient method. We present a parallel multifrontal triangular solution algorithm for computing the sparse forward elimination and sparse... |

10 |
Users guide for SPARSPAK–A: Waterloo sparse linear equations package
- Chu, George, et al.
- 1984
(Show Context)
Citation Context ...-Boeing collection of sparse matrices [7]. All grid problems are ordered by the nested-dissection ordering [11]. CN28CN is ordered by the automatic nested-dissection ordering subroutine from SPARSPAK =-=[5]-=-. All other problems are ordered by minimum degree ordering [14]. The characteristics of the test problems are shown in Table 1, where N is the number of equation, jLj is the number of nonzeros in the... |

10 |
Some applications of clique trees to the solution of sparse linear systems
- Peyton
- 1986
(Show Context)
Citation Context ...the adjacency graph of a Cholesky factor L can be organized into a tree structure called clique tree. A discussion of clique trees and their applications in sparse matrix computations may be found in =-=[18, 22, 24]-=-. A maximal clique is referred to as a clique for brevity. The adjacency graph of a 6 \Theta 6 Cholesky factor and a corresponding clique tree are shown in Fig. 1, where each clique is shown as a circ... |

9 |
Solution of Linear Systems with Striped Sparse Matrices
- Melhem
- 1986
(Show Context)
Citation Context ...Zmijewski [27] considered the use of cyclic algorithms employed in the dense triangular solution algorithms for solving sparse triangular systems. Other parallel sparse triangular solution algorithms =-=[1, 16, 21, 26]-=- have been proposed in the context of preconditioned conjugate gradient method. We present a parallel multifrontal triangular solution algorithm for computing the sparse forward elimination and sparse... |

6 |
Compact clique tree data structures for sparse matrix factorizations
- Pothen, Sun
- 1990
(Show Context)
Citation Context ...the adjacency graph of a Cholesky factor L can be organized into a tree structure called clique tree. A discussion of clique trees and their applications in sparse matrix computations may be found in =-=[18, 22, 24]-=-. A maximal clique is referred to as a clique for brevity. The adjacency graph of a 6 \Theta 6 Cholesky factor and a corresponding clique tree are shown in Fig. 1, where each clique is shown as a circ... |

5 |
A parallel triangular solver for a hypercube multiprocessor
- Li, Coleman
- 1988
(Show Context)
Citation Context ...rapezoidal system T z = d associated with a clique K. The factor matrix T is wrapped around a set of processors. The algorithm is a modified version of the cyclic algorithm proposed by Li and Coleman =-=[19]-=- for solving dense triangular systems on distributed-memory multiprocessors. In our context, we need to consider the following constraining factors. (i). In general the set of processors on which T is... |

1 |
Sparse matrix test problems. Private communication
- Chinchalker
- 1992
(Show Context)
Citation Context ...on of sparse least-square problems arising from the particle method for turbulent combustion [6, 23]. Test problems CN28CN, TPAS and TECT3 arise from numerical modelling of some biomechanical systems =-=[4]-=-. NASA4704 is from the NASA Ames collection of sparse matrices. The remaining test problems are from the Harwell-Boeing collection of sparse matrices [7]. All grid problems are ordered by the nested-d... |

1 |
Large-scale box-constrained least squares calculations for turbulent combustion
- Coleman, Sun
- 1992
(Show Context)
Citation Context ...model problems but also have practical applications. One application of 3D grids is in the numerical solution of sparse least-square problems arising from the particle method for turbulent combustion =-=[6, 23]-=-. Test problems CN28CN, TPAS and TECT3 arise from numerical modelling of some biomechanical systems [4]. NASA4704 is from the NASA Ames collection of sparse matrices. The remaining test problems are f... |

1 |
Solving sparse triangular systems using fortran with extensions on the nyu ultracomputer prototype
- Greenbaum
- 1986
(Show Context)
Citation Context |

1 |
Particle methods for turbulent combustion. Private communication
- Pope
- 1992
(Show Context)
Citation Context ...model problems but also have practical applications. One application of 3D grids is in the numerical solution of sparse least-square problems arising from the particle method for turbulent combustion =-=[6, 23]-=-. Test problems CN28CN, TPAS and TECT3 arise from numerical modelling of some biomechanical systems [4]. NASA4704 is from the NASA Ames collection of sparse matrices. The remaining test problems are f... |