Results 1  10
of
11
The Multicomputer Toolbox Approach to Concurrent BLAS
 Proc. Scalable High Performance Computing Conf. (SHPCC
, 1993
"... Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in generalpurpose CBLAS. Algorithms for dense matrixvector and matrixmatrix multiplication on ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in generalpurpose CBLAS. Algorithms for dense matrixvector and matrixmatrix multiplication on general P \Theta Q logical process grids are presented, and experiments run demonstrating their performance characteristics. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy. Work performed under the auspices of the U. S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. W7405ENG48. Submitted to the Concurrency: Practice & Experience. y Address correspondence to: Mississippi State University, Engineering Research Center, PO Box 6176, Mississippi State, MS 39762. 6013258435. tony@cs.msstate.edu. Falgout, Skjellum, Smith & Still  The Multicomputer Toolbo...
The Multicomputer Toolbox: Scalable Parallel Libraries for LargeScale Concurrent Applications
, 1994
"... In this paper, we consider what is required to develop parallel algorithms for engineering applications on messagepassing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
In this paper, we consider what is required to develop parallel algorithms for engineering applications on messagepassing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was accomplished with attention to portability, high performance and reusability of the underlying algorithms. Emerging from this work are several key results: first, a methodology for explicit parallelization of algorithms and for the evaluation of parallel algorithms in the distributedmemory context; second, a set of portable, reusable numerical algorithms constituting a "Multicomputer Toolbox," suitable for use on both existing and future mediumgrain concurrent computers; third, a working prototype simulation system, Cdyn, for distillation problems, that can be enhanced (with additional work) to address more complex flowsheeting problems in chemical engineering; fourth, ideas for how to a...
Parallel BlockDiagonalBordered Sparse Linear Solvers for Electrical Power System Applications
, 1995
"... This thesis presents research into parallel linear solvers for blockdiagonalbordered sparse matrices. The blockdiagonalbordered form identifies parallelism that can be exploited for both direct and iterative linear solvers. We have developed efficient parallel blockdiagonalbordered sparse dire ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
This thesis presents research into parallel linear solvers for blockdiagonalbordered sparse matrices. The blockdiagonalbordered form identifies parallelism that can be exploited for both direct and iterative linear solvers. We have developed efficient parallel blockdiagonalbordered sparse direct methods based on both LU factorization and Choleski factorization algorithms, and we have also developed a parallel blockdiagonalbordered sparse iterative method based on the GaussSeidel method. Parallel factorization algorithms for blockdiagonalbordered form matrices require a specialized ordering step coupled to an explicit load balancing step in order to generate this matrix form and to distribute the computational workload uniformly for an irregular matrix throughout a distributedmemory multiprocessor. Matrix orderings are performed using a diakoptic technique based on nodetearingnodal analysis. Parallel GaussSeidel algorithms for blockdiagonalbordered form matrices require a twopart matrix ordering technique  first to partition the matrix into blockdiagonalbordered form, again, using the nodetearing diakoptic techniques and then to multicolor the data in the last diagonal block using graph coloring techniques. The ordered matrices have extensive parallelism, while maintaining the strict precedence relationships in the GaussSeidel algorithm. Empirical
A PolyAlgorithm for Parallel Dense Matrix Multiplication on TwoDimensional Process Grid Topologies
, 1995
"... In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on twodimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coh ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on twodimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance...
The Multicomputer Toolbox: Current and Future Directions
 Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer
, 1993
"... The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an objectoriented design; Cbas ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an objectoriented design; Cbased strategies for classes of distributed data structures (including distributed matrices and vectors) as well as uniform calling interfaces are defined. At a high level in the Toolbox, datadistributionindependence (DDI) support is provided. DDI is needed to build scalable libraries, so that applications do not have to redistribute data before calling libraries. Datadistributionindependent mapping functions implement this capability. Datadistributionindependent algorithms are sometimes more efficient than fixeddata distribution counterparts, because redistribution of data can be avoided. Underlying the system is a "performance and portability layer," which includes interfaces to sequent...
Driving Issues in Scalable Libraries: PolyAlgorithms, Data Distribution Independence, Redistribution, Local Storage Schemes
, 1995
"... In this paper we describe our perspective of the issues and strategies involved in stateoftheart scalable parallel library research and development. We divide the discussion into four key areas: data distribution independence, issues in redistribution, local storage schemes, and the role of poly ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper we describe our perspective of the issues and strategies involved in stateoftheart scalable parallel library research and development. We divide the discussion into four key areas: data distribution independence, issues in redistribution, local storage schemes, and the role of polyalgorithms.
Parallel DifferentialAlgebraic Equation Solvers for Power System Transient Stability Analysis
 Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer
, 1993
"... Realtime or fasterthanrealtime power system transient stability simulations will have significant impact on the future design and operations of both individual electrical utility companies and large interconnected power systems. The analysis involves solution of extremely large systems of differ ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Realtime or fasterthanrealtime power system transient stability simulations will have significant impact on the future design and operations of both individual electrical utility companies and large interconnected power systems. The analysis involves solution of extremely large systems of differential and algebraic equations. DifferentialAlgebraic Equation (DAE) solvers have been used to solve problems similar in nature to the transient stability analysis (TSA) problem. This paper discusses the possibility of the use of the existing DAE solvers to solve the transient stability analysis application. We also discuss our research in developing a scalable, parallel DAE solver for use by the power system community and in related applications [13].
An Improved Algorithm for Parallel Sparse LU Decomposition on a DistributedMemory Multiprocessor
 Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper we present a new parallel algorithm for the LU decomposition of a general sparse matrix. Among its features are matrix redistribution at regular intervals and a dynamic pivot search strategy that adapts itself to the number of pivots produced. Experimental results obtained on a network ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper we present a new parallel algorithm for the LU decomposition of a general sparse matrix. Among its features are matrix redistribution at regular intervals and a dynamic pivot search strategy that adapts itself to the number of pivots produced. Experimental results obtained on a network of 400 transputers show that these features considerably improve the performance. 1 Introduction This paper presents an improved version of the parallel algorithm for the LU decomposition of a general sparse matrix developed by van der Stappen, Bisseling, and van de Vorst [9]. The LU decomposition of a matrix A = (A ij ; 0 i; j ! n) produces a unit lower triangular matrix L, an upper triangular matrix U , a row permutation vector ß and a column permutation vector ae, such that A ß i ;ae j = (LU) ij ; for 0 i; j ! n: (1) We assume that A is sparse and nonsingular and that it has an arbitrary pattern of nonzeros, with all elements having the same (small) probability of being nonzero. A re...
Decomposition of SpaceTime Domains: Accelerated Waveform Methods, with Application to Semiconductor Device Simulation
, 1994
"... In this paper, we present accelerated waveform methods for solving timedependent problems from the viewpoint of domainbased parallelism. We show that the waveform approach is a methodology for decomposing the spacetime computational domain of timedependent problems and circumventing the serial t ..."
Abstract
 Add to MetaCart
In this paper, we present accelerated waveform methods for solving timedependent problems from the viewpoint of domainbased parallelism. We show that the waveform approach is a methodology for decomposing the spacetime computational domain of timedependent problems and circumventing the serial timestepping bottleneck that arises when parallelizing standard sequential methods. Experimental results using waveform methods to solve the timedependent semiconductor driftdiffusion equations are presented and demonstrate that waveform techniques are superior to standard techniques in different parallel environments. 1 Introduction Standard solution methods for numerically solving timedependent problems typically begin by discretizing the problem on a uniform time grid and then sequentially solving for successive time points. The initial time discretization imposes a serialization to the solution process and limits parallel speedup to the speedup available from parallelizing the proble...