## SangdoS-Dong, Dongjak-Ku Seoul 1.56-743, Koreu

### BibTeX

@MISC{Choi_sangdos-dong,dongjak-ku,

author = {Jaeyoung Choi and Jack J. Dongarra and Jack J. Dongarra and David and W. Walker},

title = {SangdoS-Dong, Dongjak-Ku Seoul 1.56-743, Koreu},

year = {}

}

### OpenURL

### Abstract

PB-BLAS: A set of parallel block basic linear algebra subprograms

### Citations

742 |
A set of level 3 basic linear algebra subprograms
- DONGARRA, DUCROZ, et al.
- 1990
(Show Context)
Citation Context ... levels. This can be done by partitioning the matrix or matrices into blocks and by performing the computation with matrix-matrix operations on the blocks. Another extended set of BLAS (Level 3 BLAS) =-=[6]-=- were proposed for that purpose. The Level 3 BLAS have been successfully used as the building blocks of a number of applications, including LAPACK [7], a software library that uses block-partitioned a... |

532 |
Basic linear algebra subprograms for fortran usage
- Lawson, Hanson, et al.
- 1979
(Show Context)
Citation Context ...TION In 1973, Hanson, Krogh and Lawson[ 11 described the advantages of adopting a set of basic routines for problems in linear algebra. The first set of basic linear algebra subprograms (Level 1 BLAS)=-=[2]-=- defines operations on one or two vectors. LINPACK[3] and EISPACK [4] are built on top of the Level 1 BLAS. An extended set of BLAS (Level 2 BLAS) [5] was proposed to support the development of softwa... |

446 | An extended set of Fortran basic linear algebra subroutines
- DONGARRA, DUCROZ, et al.
- 1988
(Show Context)
Citation Context ...of basic linear algebra subprograms (Level 1 BLAS)[2] defines operations on one or two vectors. LINPACK[3] and EISPACK [4] are built on top of the Level 1 BLAS. An extended set of BLAS (Level 2 BLAS) =-=[5]-=- was proposed to support the development of software that would be portable and efficient, particularly on vector-processing machines. These routines perfom computations on a matrix and one or two vec... |

305 |
LINPACK User's Guide
- DONGARRA, BUNCH, et al.
- 1979
(Show Context)
Citation Context ...the advantages of adopting a set of basic routines for problems in linear algebra. The first set of basic linear algebra subprograms (Level 1 BLAS)[2] defines operations on one or two vectors. LINPACK=-=[3]-=- and EISPACK [4] are built on top of the Level 1 BLAS. An extended set of BLAS (Level 2 BLAS) [5] was proposed to support the development of software that would be portable and efficient, particularly... |

159 | ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers - Choi, Dongarra, et al. - 1992 |

59 | PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers. Concurrency: Practice and Experience - Choi, Dongarra, et al. - 1994 |

34 |
The design of scalable software libraries for distributed memory concurrent computers
- Choi, Dongarra, et al.
- 1993
(Show Context)
Citation Context ...distribution provides a simple, yet general-purpose way of distributing a block-partitioned matrix on distributed memory cencurrent computers. In the block cyclic distribution, described in detail in =-=[8,9]-=-, an M x N matrix is partitioned into blocks of size F x c, and blocks separated by a fixed stride in the column and row directions are assigned to the same processor. If the stride in the column and ... |

34 | The Design of a Parallel, Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal and Bidigonal Form. Numerical Algorithms - Choi, Dongarra, et al. - 1995 |

31 | Facilitating implementation - Au, Choi - 1999 |

30 |
A Parallel Triangular Solver for a DistributedMemory Multiprocessor
- Li, Coleman
- 1988
(Show Context)
Citation Context ... on a column of processors, IBPOS, starting at IAROW, as shown in Figure 5 (c). The implementation of the linear triangular system solver is a two-dimensional block version of Li and Coleman’s method =-=[21]-=-. Since A and B are distributed block cyclically, all computations in [21] are changed to block computations using the routines DTRSM and DGEMM. If SIDE = ‘Left’,(Q- 1) blocks of B are rotated column-... |

28 | The Multicomputer Toolbox Approach to Concurrent BLAS and LACS - Falgout, Skjellum, et al. - 1992 |

28 | Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers - Choi, Dongarra, et al. - 1993 |

25 |
LAPACK block factorization algorithms on the Intel iPSC/860
- Dongarra, Ostrouchov
- 1990
(Show Context)
Citation Context ...e the effectiveness of the Level 3 BLAS in [6]. However, we use the right-looking version of the algorithm, since it minimizes data communication and distributes the computation across all processors =-=[22]-=-. Cholesky factorization factors a symmetric, positive definite matrix A into the product of a lower triangular matrix L and its transpose, i.e., A = LLT. It is assumed that the lower trianglular port... |

20 |
A Proposal for Standard Linear Algebra Subprograms
- Hanson, Krogh, et al.
- 1973
(Show Context)
Citation Context ...s useful to imagine the processors arranged as a P x Q mesh or template. The processor at position (p,q) (0 5 p < P, 0 5 q < &) in the template is assigned the blocks indexed by (P+ 2 . p, 4 + j. Q), =-=(1)-=- wherei=O ,..., L(Mb-p-I)/PJ,j =o,...,[(Nb-4- I)/Q],andMb x Nhisthe size in blocks of the matrix (Mb = [ M/F~, Nb = [N/s]). Blocks are scattered in this way so that good load balance can be maintained... |

16 | de Geijn, “Basic linear algebra communication subprograms - Anderson, Benzoni, et al. - 1991 |

15 | Level 3 BLAS for distributed memory concurrent computers - Choi, Dongarra, et al. - 1992 |

12 | Basic matrix subprograms for distributed memory systems - Elster - 1990 |

9 |
A look at scalable linear algebra libraries
- Dongarra, Geijn, et al.
- 1992
(Show Context)
Citation Context ...distribution provides a simple, yet general-purpose way of distributing a block-partitioned matrix on distributed memory cencurrent computers. In the block cyclic distribution, described in detail in =-=[8,9]-=-, an M x N matrix is partitioned into blocks of size F x c, and blocks separated by a fixed stride in the column and row directions are assigned to the same processor. If the stride in the column and ... |

2 | Houstis. The parallelization of level 2 and 3 BLAS operations on distributed memory machines - Aboelaze, Chrisochoides, et al. - 1991 |

1 |
A proposal for a user-level, message passinginterface in adistributed memory environment‘, TechnicalReport TM-12231, OakRidge National Laboratory
- Dongarra, Hempel, et al.
- 1993
(Show Context)
Citation Context ...CK. A set of ScaLAPACK routines for performing LU, QR and Cholesky factorizations [ 191 and for reducing matrices to Hessenberg, tridiagonal and bidiagonal form have been implemented with the PB-BLAS =-=[20]-=-. The PB-BLAS are currently available for all arithmetic data types, i.e. single and double precision, real and complex. The PB-BLAS routines are available through netlib under the scalapack directory... |