## The design and implementation of the parallel out-of-core scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS-97-247 (1997)

### Cached

### Download Links

Citations: | 28 - 5 self |

### BibTeX

@MISC{Dongarra97thedesign,

author = {Jack Dongarra},

title = {The design and implementation of the parallel out-of-core scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS-97-247},

year = {1997}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer submatrice panels into memory. The ‘left-looking ’ column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as incore computational kernels. We present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf linux cluster.

### Citations

96 | Locality of reference in LU decomposition with partial pivoting
- Toledo
- 1997
(Show Context)
Citation Context ... is ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ (7) P2P1 ⎜ ⎝ A11 A21 A12 A22 ⎟ ⎠ = ⎜ ⎝ L11 0 ˜L21 L22 ⎟ ⎜ ⎠ ⎝ U11 0 U12 U22 ⎟ ⎠ , ˜L21 = P2L21 . Note that the above is the recursively-partitioned LU factorization proposed by Toledo =-=[21]-=- if k is chosen to be n/2. A right-looking variant results if k = n0 is always chosen where most of the computation is the updating of Ã22 ← Ã22 − L21U12 . A left-looking variant results if k = n − n0... |

79 |
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
- Choi, Dongarra, et al.
- 1995
(Show Context)
Citation Context ...e software has been portered to run on IBM SP, Compaq Alpha cluster, SGI multiprocessors, and Beowulf Linux clusters. The implementation is based on modular software building blocks such as the PBLAS =-=[3, 4, 16]-=- (Parallel Basic Linear Algebra Subprograms), and the BLACS [9, 10] (Basic Linear Algebra Communication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core ... |

64 | The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations
- Toledo, Gustavson
- 1996
(Show Context)
Citation Context ...mensional cyclically-shifted block layout that achieves good load balance even when operating on narrow block rows or block columns was proposed in MIOS (Matrix Input-Output Subroutines) used in SOLAR=-=[22]-=-. However, this scheme is more complex to implement, (SOLAR does not yet use this scheme). Moreover, another data redistribution is required to maintain compatibility with in-core ScaLAPACK software. ... |

32 |
PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subroutines. Concurrency: Practice and Experience
- Choi, Dongarra, et al.
- 1996
(Show Context)
Citation Context ...e software has been portered to run on IBM SP, Compaq Alpha cluster, SGI multiprocessors, and Beowulf Linux clusters. The implementation is based on modular software building blocks such as the PBLAS =-=[3, 4, 16]-=- (Parallel Basic Linear Algebra Subprograms), and the BLACS [9, 10] (Basic Linear Algebra Communication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core ... |

31 |
Facilitating implementation
- Au, Choi
- 1999
(Show Context)
Citation Context ... ← Ã22 − L21U12 . A left-looking variant results if k = n − n0. The in-core ScaLAPACK factorization routines for LU, QR and Cholesky factorization, use a right-looking variant for good load balancing =-=[5]-=-. Other work has shown [8, 15] that for an out-of-core factorization, a left-looking variant generates less I/O volume compared to the rightlooking variant. Toledo [22] shows that the recursively-part... |

25 |
de Geijn. Two dimensional Basic Linear Algebra Communication Subprograms
- Dongarra, van
- 1991
(Show Context)
Citation Context ...GI multiprocessors, and Beowulf Linux clusters. The implementation is based on modular software building blocks such as the PBLAS [3, 4, 16] (Parallel Basic Linear Algebra Subprograms), and the BLACS =-=[9, 10]-=- (Basic Linear Algebra Communication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are ... |

23 | The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines - Choi, Dongarra, et al. - 1994 |

22 | Algorithmic Redistribution Methods for Block Cyclic Decompositions, doctoral thesis
- Petitet
- 1996
(Show Context)
Citation Context ...e software has been portered to run on IBM SP, Compaq Alpha cluster, SGI multiprocessors, and Beowulf Linux clusters. The implementation is based on modular software building blocks such as the PBLAS =-=[3, 4, 16]-=- (Parallel Basic Linear Algebra Subprograms), and the BLACS [9, 10] (Basic Linear Algebra Communication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core ... |

19 | A fast solution method for three-dimensional many-particle problems of linear elasticity
- Fu, KJ, et al.
- 1998
(Show Context)
Citation Context ... resolution three-dimensional wave scattering problems using the boundary element formulation [6, 7, 12, 20]. Although a fast multipole formulation (FMM) may be an efficient alternative in some cases =-=[11]-=-, a dense matrix formulation is still necessary in complicated geometry or FMM version is not available. This development effort has the objective of producing portable software that achieves high per... |

18 |
Solution of elastic scattering problems in linear acoustics using h-p boundary element method
- Demkowicz, Karafiat, et al.
- 1992
(Show Context)
Citation Context ...s arise from modeling effect of RF heating of plasmas in fusion applications [1, 13, 14] and modeling high resolution three-dimensional wave scattering problems using the boundary element formulation =-=[6, 7, 12, 20]-=-. Although a fast multipole formulation (FMM) may be an efficient alternative in some cases [11], a dense matrix formulation is still necessary in complicated geometry or FMM version is not available.... |

15 | Two dimensional basic linear algebra communication subprograms
- DONGARRA, GEIJN, et al.
- 1993
(Show Context)
Citation Context ...GI multiprocessors, and Beowulf Linux clusters. The implementation is based on modular software building blocks such as the PBLAS [3, 4, 16] (Parallel Basic Linear Algebra Subprograms), and the BLACS =-=[9, 10]-=- (Basic Linear Algebra Communication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are ... |

14 |
de Geijn R. Massively parallel computation for acoustical scattering problems using boundary element methods
- Geng, JT, et al.
- 1996
(Show Context)
Citation Context ...s arise from modeling effect of RF heating of plasmas in fusion applications [1, 13, 14] and modeling high resolution three-dimensional wave scattering problems using the boundary element formulation =-=[6, 7, 12, 20]-=-. Although a fast multipole formulation (FMM) may be an efficient alternative in some cases [11], a dense matrix formulation is still necessary in complicated geometry or FMM version is not available.... |

12 |
The application of parallel computation to integral equation models of electromagnetic scattering
- Cwik, Geijn, et al.
- 1994
(Show Context)
Citation Context ...s arise from modeling effect of RF heating of plasmas in fusion applications [1, 13, 14] and modeling high resolution three-dimensional wave scattering problems using the boundary element formulation =-=[6, 7, 12, 20]-=-. Although a fast multipole formulation (FMM) may be an efficient alternative in some cases [11], a dense matrix formulation is still necessary in complicated geometry or FMM version is not available.... |

11 |
de Geijn, POOCLAPACK: Parallel out-of-core linear algebra package
- Reiley, van
- 1999
(Show Context)
Citation Context ...and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are reported in the literature [2, 15, 18, 19]. A recent work =-=[17]-=- describes out-of-core Cholesky factorization using PLAPACK on the CRAY3 T3E and HP Exemplar. Our work is built upon the portable ScaLAPACK library and includes the LU, QR and Cholesky methods. Since... |

8 | Load-balanced LU and QR factor and solve routines for scalable processors with scalable I/O
- Brunet, Pederson, et al.
- 1994
(Show Context)
Citation Context ...unication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are reported in the literature =-=[2, 15, 18, 19]-=-. A recent work [17] describes out-of-core Cholesky factorization using PLAPACK on the CRAY3 T3E and HP Exemplar. Our work is built upon the portable ScaLAPACK library and includes the LU, QR and Cho... |

8 |
Out of core dense solvers on Intel parallel supercomputers
- Scott
- 1992
(Show Context)
Citation Context ...unication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are reported in the literature =-=[2, 15, 18, 19]-=-. A recent work [17] describes out-of-core Cholesky factorization using PLAPACK on the CRAY3 T3E and HP Exemplar. Our work is built upon the portable ScaLAPACK library and includes the LU, QR and Cho... |

7 |
de Geijn, Anatomy of a parallel out-of-core dense linear solver
- Klimkowski, van
- 1995
(Show Context)
Citation Context ...unication Subprograms). Proven and highly efficient ScaLAPACK factorization routines are used for in-core computations. Earlier out-of-core dense linear algebra efforts are reported in the literature =-=[2, 15, 18, 19]-=-. A recent work [17] describes out-of-core Cholesky factorization using PLAPACK on the CRAY3 T3E and HP Exemplar. Our work is built upon the portable ScaLAPACK library and includes the LU, QR and Cho... |

7 |
Parallel I/O and solving out-of-core systems of linear equations
- Scott
- 1993
(Show Context)
Citation Context |

3 |
Key concepts for parallel out-of-core
- Dongarra, Hammarling, et al.
- 1996
(Show Context)
Citation Context ...ds. Since pivoting is required in LU factorization, the current algorithm mainly uses variable width column panels whereas [17] is based on decomposition by square submatrices. Our work improves upon =-=[8]-=- in performing parallel I/O based on in-core ScaLAPACK block-cyclic distribution. Moreover, the current implementation has more efficient handling of pivoting by storing partially pivoted factors on d... |

2 | D’Azevedo 7–11 - Darland |

2 | ER-30, Associate Director, Office of Energy Research - Nelson |

2 |
Second-order radio frequency kinetic theory with applications to flow drive andheatingintokamakplasmas
- Jaeger, Berry, et al.
(Show Context)
Citation Context ... Therefore, it is natural to develop parallel out-of-core solvers to tackle large dense linear systems. Large dense problems arise from modeling effect of RF heating of plasmas in fusion applications =-=[1, 13, 14]-=- and modeling high resolution three-dimensional wave scattering problems using the boundary element formulation [6, 7, 12, 20]. Although a fast multipole formulation (FMM) may be an efficient alternat... |

1 |
Wave-induced momentum transport and flow drive in tokamak plasmas, Phys
- Berry, Jaeger, et al.
- 1999
(Show Context)
Citation Context ... Therefore, it is natural to develop parallel out-of-core solvers to tackle large dense linear systems. Large dense problems arise from modeling effect of RF heating of plasmas in fusion applications =-=[1, 13, 14]-=- and modeling high resolution three-dimensional wave scattering problems using the boundary element formulation [6, 7, 12, 20]. Although a fast multipole formulation (FMM) may be an efficient alternat... |

1 | PVM implementation of the symmetric-galerkin method
- Semeraro, Gray
- 1997
(Show Context)
Citation Context |