Results 1 - 10
of
11
The Multicomputer Toolbox Approach to Concurrent BLAS
- Proc. Scalable High Performance Computing Conf. (SHPCC
, 1993
"... Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on general P \Theta Q logical process grids are presented, and experiments run demonstrating their performance characteristics. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy. Work performed under the auspices of the U. S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48. Submitted to the Concurrency: Practice & Experience. y Address correspondence to: Mississippi State University, Engineering Research Center, PO Box 6176, Mississippi State, MS 39762. 601-325-8435. tony@cs.msstate.edu. Falgout, Skjellum, Smith & Still --- The Multicomputer Toolbo...
The Multicomputer Toolbox: Scalable Parallel Libraries for Large-Scale Concurrent Applications
, 1994
"... In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was accomplished with attention to portability, high performance and reusability of the underlying algorithms. Emerging from this work are several key results: first, a methodology for explicit parallelization of algorithms and for the evaluation of parallel algorithms in the distributed-memory context; second, a set of portable, reusable numerical algorithms constituting a "Multicomputer Toolbox," suitable for use on both existing and future medium-grain concurrent computers; third, a working prototype simulation system, Cdyn, for distillation problems, that can be enhanced (with additional work) to address more complex flowsheeting problems in chemical engineering; fourth, ideas for how to a...
The Data-Distribution-Independent Approach to Scalable Parallel Libraries
, 1995
"... this document in the required format ..."
Parallel Block-Diagonal-Bordered Sparse Linear Solvers for Electrical Power System Applications
, 1995
"... This thesis presents research into parallel linear solvers for block-diagonal-bordered sparse matrices. The block-diagonal-bordered form identifies parallelism that can be exploited for both direct and iterative linear solvers. We have developed efficient parallel block-diagonal-bordered sparse dire ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This thesis presents research into parallel linear solvers for block-diagonal-bordered sparse matrices. The block-diagonal-bordered form identifies parallelism that can be exploited for both direct and iterative linear solvers. We have developed efficient parallel block-diagonal-bordered sparse direct methods based on both LU factorization and Choleski factorization algorithms, and we have also developed a parallel block-diagonal-bordered sparse iterative method based on the Gauss-Seidel method. Parallel factorization algorithms for block-diagonal-bordered form matrices require a specialized ordering step coupled to an explicit load balancing step in order to generate this matrix form and to distribute the computational workload uniformly for an irregular matrix throughout a distributed-memory multi-processor. Matrix orderings are performed using a diakoptic technique based on node-tearing-nodal analysis. Parallel Gauss-Seidel algorithms for block-diagonal-bordered form matrices require a two-part matrix ordering technique -- first to partition the matrix into block-diagonal-bordered form, again, using the node-tearing diakoptic techniques and then to multi-color the data in the last diagonal block using graph coloring techniques. The ordered matrices have extensive parallelism, while maintaining the strict precedence relationships in the Gauss-Seidel algorithm. Empirical
A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies
, 1995
"... In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coh ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance...
The Multicomputer Toolbox: Current and Future Directions
- Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer
, 1993
"... The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-bas ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-based strategies for classes of distributed data structures (including distributed matrices and vectors) as well as uniform calling interfaces are defined. At a high level in the Toolbox, data-distributionindependence (DDI) support is provided. DDI is needed to build scalable libraries, so that applications do not have to redistribute data before calling libraries. Data-distribution-independent mapping functions implement this capability. Data-distribution-independent algorithms are sometimes more efficient than fixeddata -distribution counterparts, because redistribution of data can be avoided. Underlying the system is a "performance and portability layer," which includes interfaces to sequent...
Driving Issues in Scalable Libraries: Poly-Algorithms, Data Distribution Independence, Redistribution, Local Storage Schemes
, 1995
"... In this paper we describe our perspective of the issues and strategies involved in state-of-the-art scalable parallel library research and development. We divide the discussion into four key areas: data distribution independence, issues in redistribution, local storage schemes, and the role of poly- ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper we describe our perspective of the issues and strategies involved in state-of-the-art scalable parallel library research and development. We divide the discussion into four key areas: data distribution independence, issues in redistribution, local storage schemes, and the role of poly-algorithms.
Parallel Differential-Algebraic Equation Solvers for Power System Transient Stability Analysis
- Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer
, 1993
"... Real-time or faster-than-real-time power system transient stability simulations will have significant impact on the future design and operations of both individual electrical utility companies and large interconnected power systems. The analysis involves solution of extremely large systems of differ ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Real-time or faster-than-real-time power system transient stability simulations will have significant impact on the future design and operations of both individual electrical utility companies and large interconnected power systems. The analysis involves solution of extremely large systems of differential and algebraic equations. Differential-Algebraic Equation (DAE) solvers have been used to solve problems similar in nature to the transient stability analysis (TSA) problem. This paper discusses the possibility of the use of the existing DAE solvers to solve the transient stability analysis application. We also discuss our research in developing a scalable, parallel DAE solver for use by the power system community and in related applications [13].
An Improved Algorithm for Parallel Sparse LU Decomposition on a Distributed-Memory Multiprocessor
- Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper we present a new parallel algorithm for the LU decomposition of a general sparse matrix. Among its features are matrix redistribution at regular intervals and a dynamic pivot search strategy that adapts itself to the number of pivots produced. Experimental results obtained on a network ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we present a new parallel algorithm for the LU decomposition of a general sparse matrix. Among its features are matrix redistribution at regular intervals and a dynamic pivot search strategy that adapts itself to the number of pivots produced. Experimental results obtained on a network of 400 transputers show that these features considerably improve the performance. 1 Introduction This paper presents an improved version of the parallel algorithm for the LU decomposition of a general sparse matrix developed by van der Stappen, Bisseling, and van de Vorst [9]. The LU decomposition of a matrix A = (A ij ; 0 i; j ! n) produces a unit lower triangular matrix L, an upper triangular matrix U , a row permutation vector ß and a column permutation vector ae, such that A ß i ;ae j = (LU) ij ; for 0 i; j ! n: (1) We assume that A is sparse and nonsingular and that it has an arbitrary pattern of nonzeros, with all elements having the same (small) probability of being nonzero. A re...
Decomposition of Space-Time Domains: Accelerated Waveform Methods, with Application to Semiconductor Device Simulation
, 1994
"... In this paper, we present accelerated waveform methods for solving time-dependent problems from the viewpoint of domain-based parallelism. We show that the waveform approach is a methodology for decomposing the space-time computational domain of time-dependent problems and circumventing the serial t ..."
Abstract
- Add to MetaCart
In this paper, we present accelerated waveform methods for solving time-dependent problems from the viewpoint of domain-based parallelism. We show that the waveform approach is a methodology for decomposing the space-time computational domain of time-dependent problems and circumventing the serial time-stepping bottleneck that arises when parallelizing standard sequential methods. Experimental results using waveform methods to solve the time-dependent semiconductor drift-diffusion equations are presented and demonstrate that waveform techniques are superior to standard techniques in different parallel environments. 1 Introduction Standard solution methods for numerically solving time-dependent problems typically begin by discretizing the problem on a uniform time grid and then sequentially solving for successive time points. The initial time discretization imposes a serialization to the solution process and limits parallel speedup to the speedup available from parallelizing the proble...

