Results 1 
6 of
6
Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations
, 1999
"... The finite element method has grown, in the past 40 years, to be a popular method for the simulation of physical systems in science and engineering. The finite element method is used in a wide array of industries. In fact just about any enterprise that makes a physical product can, and probably do ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The finite element method has grown, in the past 40 years, to be a popular method for the simulation of physical systems in science and engineering. The finite element method is used in a wide array of industries. In fact just about any enterprise that makes a physical product can, and probably does, use finite element technology. The success of the finite element method is due in large part to its ability to allow the use of accurate formulation of partial differential equations (PDEs), on arbitrarily general physical domains with complex boundary conditions. Additionally, the rapid growth in the computational power available in todays computers  for an ever more affordable price  has made finite element technology...
A Dense Complex Symmetric Indefinite Solver for the Fujitsu AP3000
, 1999
"... This paper describes the design, implementation and performance of a parallel direct dense symmetricindefinite solver routine. Such a solver is required for the large complex systems arising from electromagnetic field analysis, such as are generated from the AccuField application. The primary targ ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper describes the design, implementation and performance of a parallel direct dense symmetricindefinite solver routine. Such a solver is required for the large complex systems arising from electromagnetic field analysis, such as are generated from the AccuField application. The primary target architecture for the solver is the Fujitsu AP3000, a distributed memory machine based on the UltraSPARC processor. The routine is written entirely in terms of the DBLAS Distributed Library, recently extended for complex precision. It uses the BunchKaufman diagonal pivoting method and is based on the LAPACK algorithm, with several modi cations required for efficient parallel implementation and one modification to reduce the amount of symmetric pivoting. Currently the routine uses a standard BLAS computational interface and can use either the MPI, BLACS or VPPLib communication interfaces (the latter is only available under the APruntime V2.0 system for the AP3000). The routine outperforms its e...
Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations
"... In this paper we report the development of a very high performance parallel eigensolver based on the portable ScaLAPACK library, and its application to electronic structure calculations in the MPQuest code. This work was done on ASCIRed, a supercomputer based on over 4600 dualprocessor Pentiu ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this paper we report the development of a very high performance parallel eigensolver based on the portable ScaLAPACK library, and its application to electronic structure calculations in the MPQuest code. This work was done on ASCIRed, a supercomputer based on over 4600 dualprocessor Pentium Pro 1 nodes at Sandia National Laboratories. We report sustained performance in the code of 605GFlops and peak performance in the eigensolver of 684GFlops. This is comparable to performance obtained from MPLinpack on a similar sized problem. For a smaller problem we have sustained performance of 420GFlops in the application and peak performance in the eigensolver of 563GFlops. Impact of this work on the specific application is important, but the development of significant improvements to a portable eigensolver and other libraries will also benefit a number of applications. 1 Introduction MPQuest is a parallel electronic structure program which is used extensively in productio...
OPTIMAL LOAD BALANCING TECHNIQUES FOR BLOCKCYCLIC DECOMPOSITIONS FOR MATRIX FACTORIZATION
"... In this paper, we present a new load balancing technique, called panel scattering, which is generally applicable for parallel blockpartitioned dense linear algebra algorithms, such as matrix factorization. Here, the panels formed in such computation are divided across their length, and evenly (re) ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In this paper, we present a new load balancing technique, called panel scattering, which is generally applicable for parallel blockpartitioned dense linear algebra algorithms, such as matrix factorization. Here, the panels formed in such computation are divided across their length, and evenly (re)distributed among all processors. It is shown how this technique can be efficiently implemented for the general blockcyclic matrix distribution, requiring only the collective communication primitives that required for blockcyclic parallel BLAS. In most situations, panel scattering yields optimal load balance and cell computation speed across all stages of the computation. It has also advantages in naturally yielding good memory access patterns. Compared with traditional methods which minimize communication costs at the expense of load balance, it has a small (in some situations negative) increase in communication volume costs. It however incurs extra communication startup costs, but only by a factor not exceeding 2. To maximize load balance, storage block sizes should be kept small; furthermore, in many situations of interest, there will be a small or even negative communication penalty for doing so. Results will be given on the Fujitsu AP+ parallel computer, which will compare the performance of panel scattering with previously established methods, for LU, LLT and QR factorization. These are consistent with a detailed performance model for LU factorization.
An efficient implementation of parallel eigenvalue computation for massively parallel processing
 PARALLEL COMPUTING
, 2001
"... This article describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem siz ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This article describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g. the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 25 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201.