## Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers (1998)

### Cached

### Download Links

Citations: | 115 - 29 self |

### BibTeX

@MISC{Amestoy98multifrontalparallel,

author = {P. R. Amestoy and I. S. Duff and J.-Y. L'Excellent},

title = { Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers },

year = {1998}

}

### OpenURL

### Abstract

We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed. We discuss some of the main algorithmic choices and compare both implementation issues and the performance of the LDL T and LU factorizations. Performance analysis on an IBM SP2 shows the efficiency and the potential of the method. The test problems used are from the Rutherford-Boeing collection and from the PARASOL end users.

### Citations

742 | A set of level 3 basic linear algebra subprograms - DONGARRA, DUCROZ, et al. - 1990 |

351 |
ScaLAPACK Users’ Guide
- Blackford, Choi, et al.
- 1997
(Show Context)
Citation Context ...tric or not and will be given in Section 3.4.3. 3.2.2 Description of type 3 parallelism In order to have good scalability, we perform a 2D block cyclic distribution of the root node. We use ScaLAPACK =-=[6]-=- or the vendor equivalent implementation (PDGETRF for unsymmetric matrices and PDPOTRF for symmetric matrices). Currently, a maximum of one root node, chosen during the analysis, is processed in paral... |

253 | A column approximate minimum degree ordering algorithm
- Davis, Gilbert, et al.
(Show Context)
Citation Context ... and 512 MBytes of virtual memory. An Figure 11: VAMPIR trace of an isolated type 2 symmetric factorization; variable row block sizes. (Master is Process 1). approximate Minimum Degree (AMD) ordering =-=[1]-=- has been used to permute the initial matrix and all timings are given in seconds. 4.1 The theoretical speed-up of the methods The maximum theoretical speed-up obtained for each type of parallelism is... |

234 | Users’ guide for the harwell-boeing sparse matrix collection
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...5. Throughout this paper we will show the performance of our algorithms on a set of test problems. These test problems consist of symmetric and unsymmetric problems from the Harwell-Boeing collection =-=[13]-=- and problems from the PARASOL end users and are shown in Table 1. 2 Multifrontal methods The multifrontal method for the solution of sparse linear equations is a direct method based on the LU factori... |

216 |
The Multifrontal Solution of Indefinite Sparse Symmetric Linear Equations
- Duff, Reid
- 1983
(Show Context)
Citation Context ...the project web site at http://www.genias.de/parasol. CERFACS and RAL with the collaboration of ENSEEIHT-IRIT are developing the direct solver based on a multifrontal approach originally developed by =-=[14, 15]-=- and extended to shared memory computers by [2, 3, 12] and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the perfo... |

133 | Balancing Domain Decomposition - Mandel - 1993 |

111 | Lapack: a portable linear algebra library for high-performance computers
- Anderson, Bai, et al.
- 1990
(Show Context)
Citation Context ...ization algorithms, we want to keep the possibility of postponing the elimination of fully-summed variables. Note that classical blocked algorithms for the LU and LL T factorizations of full matrices =-=[5]-=- are quite efficient, but it is not at all the case for the LDL T factorization. We will briefly compare kernels involved in the blocked algorithms. We then show how we have exploited the frontal matr... |

100 |
Algorithm 679: A set of level 3 basic linear algebra subprograms
- Dongarra, Croz, et al.
- 1990
(Show Context)
Citation Context ...ored by rows. During LU factorization, a KJI-SAXPY blocked algorithm is used [2, 7] to compute the LU factor associated with the block of fully summed rows (matrices A and C). The Level 3 BLAS kernel =-=[10, 11]-=- DTRSM is used to compute the off-diagonal block of L (overwriting matrix B). Updating the matrix E is then a simple call to the Level 3 BLAS kernel, DGEMM. C B A E Figure 6: Structure of a type 1 nod... |

78 |
The multifrontal solution of unsymmetric sets of linear systems
- Duff, Reid
- 1984
(Show Context)
Citation Context ...the project web site at http://www.genias.de/parasol. CERFACS and RAL with the collaboration of ENSEEIHT-IRIT are developing the direct solver based on a multifrontal approach originally developed by =-=[14, 15]-=- and extended to shared memory computers by [2, 3, 12] and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the perfo... |

77 | VAMPIR: Visualization and analysis of MPI resources
- Nagel, Arnold, et al.
- 1996
(Show Context)
Citation Context ...algorithm is illustrated in Figure 9, where program activity is represented in black, inactivity in grey, and messages by lines between processes. The figure is a trace record generated by the VAMPIR =-=[21]-=- package from PALLAS. We see that, on this example, the master processor is relatively more loaded than the slaves. Figure 9: VAMPIR trace of an isolated type 2 unsymmetric factorization (Master is Pr... |

56 | LAPACK Users' Guide (second edition - Anderson, Bai, et al. - 1995 |

49 |
Vectorization of a multiprocessor multifrontal code
- Amestoy, Duff
- 1989
(Show Context)
Citation Context .... CERFACS and RAL with the collaboration of ENSEEIHT-IRIT are developing the direct solver based on a multifrontal approach originally developed by [14, 15] and extended to shared memory computers by =-=[2, 3, 12]-=- and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the performance of earlier versions of the code can be found in... |

48 |
Task scheduling for parallel sparse Cholesky factorization
- Geist, Ng
- 1989
(Show Context)
Citation Context ... L L 0 1 2 4 3 Figure 1: Decomposition of the assembly tree into levels. The tree is processed from the bottom to the top, level by level (see Figure 1). Level L 0 is determined using the Algorithm 1 =-=[18]-=- and is illustrated in Figure 2. Then for i ? 0, a node belongs to L i if all its children belong to L j , jsi \Gamma 1. First, nodes of level L 0 (and associated subtrees) are mapped. This first step... |

41 |
Parallel implementation of multifrontal schemes
- Duff
- 1986
(Show Context)
Citation Context .... CERFACS and RAL with the collaboration of ENSEEIHT-IRIT are developing the direct solver based on a multifrontal approach originally developed by [14, 15] and extended to shared memory computers by =-=[2, 3, 12]-=- and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the performance of earlier versions of the code can be found in... |

27 | The Rutherford-Boeing Sparse Matrix Collection - Duff, Grimes, et al. - 1997 |

26 | Memory management issues in sparse multifrontal methods on multiprocessors - Amestoy, Duff - 1993 |

24 | The Rutherford-Boeing Sparse Matrix Collection - Grimes, Lewis - 1997 |

20 | The multifrontal solution of inde nite sparse symmetric linear systems - Du, Reid - 1983 |

14 |
Subroutine Library: a Catalogue of Subroutines
- Harwell
(Show Context)
Citation Context ...ps. This gives rise to so-called node parallelism. A version of the multifrontal code for shared memory computers was developed by [2] and was included in Release 12 of the Harwell Subroutine Library =-=[19] as c-=-ode MA41. This was the basis for Version 1.0 of MUMPS that was released in May 1997. 3 Description of the main implementation issues The current version of MUMPS ("MUltifrontal Massively Parallel... |

10 |
Use of level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors
- Daydé, Duff
- 1991
(Show Context)
Citation Context ... of Figure 6, where A is the block of fully summed variables to eliminate. Note that, in the code, the frontal matrix is stored by rows. During LU factorization, a KJI-SAXPY blocked algorithm is used =-=[2, 7]-=- to compute the LU factor associated with the block of fully summed rows (matrices A and C). The Level 3 BLAS kernel [10, 11] DTRSM is used to compute the off-diagonal block of L (overwriting matrix B... |

10 | Harwell Subroutine Library. A Catalogue of Subroutines (Release 12), AEA Technology, Harwell Laboratory, Oxfordshire, England. For information concerning HSL contact - HSL - 1996 |

9 |
Memory allocation issues in sparse multiprocessor multifrontal methods
- Amestoy, Duff
- 1993
(Show Context)
Citation Context .... CERFACS and RAL with the collaboration of ENSEEIHT-IRIT are developing the direct solver based on a multifrontal approach originally developed by [14, 15] and extended to shared memory computers by =-=[2, 3, 12]-=- and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the performance of earlier versions of the code can be found in... |

8 | A block implementation of level 3 BLAS for RISC processors
- Daydé, Duff
- 1996
(Show Context)
Citation Context ... trailing part of L off has to be updated after each step of the blocked factorization, to allow for a stability test for choosing the pivot. To update the matrix E, we have applied the ideas used by =-=[8]-=- to design efficient and portable Level 3 BLAS kernels. Blocking of the updating is done in the following way. At each step, a block of columns of E (E k in Figure 7) is updated. In our first C L k k ... |

8 | PARASOL An integrated programming environment for parallel sparse matrix solvers - Amestoy, Duff, et al. - 1998 |

6 |
MPI: A Message Passing Interface Standard
- Dongarra
- 1994
(Show Context)
Citation Context ...on from the analysis phase. When we try to send contribution blocks, factorized blocks, . . . we first check to see if there is room in the send buffer. Our module provides an equivalent of MPI BSEND =-=[9]-=- with the advantage that messages are directly packed in the buffer and problems occurring when the buffer is full are overcome. Note that messages are never sent when the destination is identical to ... |

5 | The influence of vector and parallel computers in the solution of large sparse linear equations - Duff - 1987 |

4 |
An integrated programming environment for parallel sparse matrix solvers
- PARASOL
- 1998
(Show Context)
Citation Context ...and subsequently to a prototype version using PVM by [16]. The integration of this direct code into the PARASOL Library and comments on the performance of earlier versions of the code can be found in =-=[4]-=-. We discuss some important aspects of multifrontal methods in Section 2 and describe the main implementation issues for distributed memory machines in Section 3. We consider a performance analysis of... |

4 |
Développement d’une approche multifrontale pour machines a mémoire distribuée et réseau hétérogène de stations de travail
- Espirat
- 1996
(Show Context)
Citation Context ...veloping the direct solver based on a multifrontal approach originally developed by [14, 15] and extended to shared memory computers by [2, 3, 12] and subsequently to a prototype version using PVM by =-=[16]-=-. The integration of this direct code into the PARASOL Library and comments on the performance of earlier versions of the code can be found in [4]. We discuss some important aspects of multifrontal me... |

2 | Parasol. an integrated programming environment for parallel sparse matrix solvers - S, L'Excellent, et al. - 1998 |

1 | RALPAR - RAL Mesh Partitioning Program. Version 2.0
- Fowler, Greenough
- 1998
(Show Context)
Citation Context ...c solver. example combining nested dissection and minimum degree. We are experimenting with such reorderings and plan to incorporate some code for this, developed from the RALPAR partitioning package =-=[17]-=-, within our analysis phase. This is also the topic of a collaboration with Roman and Pellegrini (LaBRI, Bordeaux) and will not be addressed further in this paper. However, we see that a significant s... |

1 | Use of Level 3 BLAS in LU factorization inamultiprocessing environment on three vector multiprocessors - Dayde, Du - 1991 |

1 | The use of vector and parallel computers in the solution of large sparse linear equations - S - 1986 |