Results 1 - 10
of
11
A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
, 1992
"... This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block s ..."
Abstract
-
Cited by 151 (33 self)
- Add to MetaCart
This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrated the scalability of the algorithm.
Software libraries for linear algebra computations on high performance computers
- SIAM REVIEW
, 1995
"... This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed b ..."
Abstract
-
Cited by 66 (17 self)
- Add to MetaCart
This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under development. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. The use of such algorithms helps reduce the message startup costs on distributed memory concurrent computers. Other key ideas in our approach are the use of distributed versions of the Level 3 Basic Linear Algebra Subprograms (BLAS) as computational building blocks, and the use of Basic Linear Algebra Communication Subprograms (BLACS) as communication building blocks. Together the distributed BLAS and the BLACS can be used to construct highe...
The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form
, 1995
"... This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequential BLAS, the Parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (or panel), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed usin...
An Object Oriented Design for High Performance Linear Algebra on Distributed Memory Architectures
, 1993
"... We describe the design of ScaLAPACK++, an object oriented C++ library for implementing linear algebra computations on distributed memory multicomputers. This package, when complete, will support distributed matrix operations for symmetric, positive-definite, and non-symmetric cases. In ScaLAPACK++ w ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
We describe the design of ScaLAPACK++, an object oriented C++ library for implementing linear algebra computations on distributed memory multicomputers. This package, when complete, will support distributed matrix operations for symmetric, positive-definite, and non-symmetric cases. In ScaLAPACK++ we have employed object oriented design methods to enchance scalability, portability, flexibility, and ease-of-use. We illustrate some of these points by describing the implementation of basic algorithms and comment on tradeoffs between elegance, generality, and performance.
Algorithmic redistribution methods for block cyclic decompositions
- IEEE Trans. on PDS
, 1996
"... ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special apprecia ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special appreciation is due to Professor Jack Dongarra, Chairman, who pro-vided sound guidance, support and appropriate commentaries during the course of my graduate study. I also would like to thank Yves Robert and R. Clint Whaley for many useful and instructive discussions on general parallel algorithms and message passing software libraries. Many valuable comments for improving the presentation of this document were received from L. Susan Blackford. Finally, I am grateful to the Department of Computer Science at the University ofTennessee for allowing me to do this doctoral research work here. A special debt of gratitude is owed to Joanne Martin, IBM POWERparallel Division, for awarding me an IBM Corporation Fellowship covering the tuition as well as a stipend for the 1994-96 academic years. This work was also supported
The Multicomputer Toolbox: Scalable Parallel Libraries for Large-Scale Concurrent Applications
, 1994
"... In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was accomplished with attention to portability, high performance and reusability of the underlying algorithms. Emerging from this work are several key results: first, a methodology for explicit parallelization of algorithms and for the evaluation of parallel algorithms in the distributed-memory context; second, a set of portable, reusable numerical algorithms constituting a "Multicomputer Toolbox," suitable for use on both existing and future medium-grain concurrent computers; third, a working prototype simulation system, Cdyn, for distillation problems, that can be enhanced (with additional work) to address more complex flowsheeting problems in chemical engineering; fourth, ideas for how to a...
The Design of Linear Algebra Libraries for High Performance Computers
, 1993
"... This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followe ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under development. The importance of block-partitioned algorithms in reducing the frequency of data movementbetween di#erent levels of hierarchical memory is stressed. The use of such algorithms helps reduce the message startup costs on distributed memory concurrent computers. Other key ideas in our approach are the use of distributed versions of the Level 3 Basic Linear Algebra Subgrams #BLAS# as computational building blocks, and the use of Basic Linear Algebra Communication Subprograms #BLACS# as communication building blocks. Together the distributed BLAS and the BLACS can be used to construct ...
A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies
, 1995
"... In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coh ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance...
A Comparison of Parallel Programming Paradigms and Data Distributions for a Limited Area Numerical Weather Forecast Routine
- proceedings of the 9 th ACM International Conference on Supercomputing, ACM
, 1995
"... In this paper the impact of parallel programming paradigms and data distributions on the performance of a parallel finite difference application is investigated. The finite difference application is one of the kernel routines of a limited area numerical weather forecast model that is in use for prod ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
In this paper the impact of parallel programming paradigms and data distributions on the performance of a parallel finite difference application is investigated. The finite difference application is one of the kernel routines of a limited area numerical weather forecast model that is in use for producing routine weather forecasts at several European meteorological institutes. Results are shown for CRAY T3D and MasPar systems. 1 Introduction The hirlam (HIgh Resolution Limited Area Model) system [1] is a production code written in Fortran 77. This state-of-the-art limited area numerical weather forecast system has been optimized for efficient execution on vector machines. However, even the computer power of vector architectures limits the model resolution to values that are unsatisfactory from a physics point of view. Therefore lower resolutions are enforced, since the weather forecasts must be available within a reasonable amount of time. These considerations focused current investig...
The Parallel Mathematical Libraries Project (PMLP) -- A Next Generation Scalable, Sparse, Object-Oriented, Mathematical Library Suite
- In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing
, 1999
"... The Parallel Mathematics Libraries Project (PMLP), a joint effort of Intel, Lawrence Livermore National Laboratory, the Russian Federal Nuclear Laboratory (VNIIEF), and Mississippi State University (MSU), constitutes a concerted effort to create a supportable, comprehensive "Sparse Object-Oriente ..."
Abstract
- Add to MetaCart
The Parallel Mathematics Libraries Project (PMLP), a joint effort of Intel, Lawrence Livermore National Laboratory, the Russian Federal Nuclear Laboratory (VNIIEF), and Mississippi State University (MSU), constitutes a concerted effort to create a supportable, comprehensive "Sparse Object-Oriented Mathematical Library suite."With overall design and software validation work at MSU, and most software development and testing at VNIIEF, this international collaboration brings objectoriented programming techniques and C++ to the task of providing linear and nonlinear algebraic-oriented algorithms for scientists and engineers. Language bindings for C, Fortran-77, and C++ are provided, offering the widest possible applicability. PMLP differs from other major library efforts in its systematic use of software engineering and design, including efforts to provide high performance, portability, and usability. In addition, important contributions of this effort, in design principles such as storageformat independence, data-distribution independence etc., which contributes towards the performance, ease-of-use, application interoperability and portability etc., will be highlighted. Finally,we will also provide an initial set of benchmarked results. 1

