Results 1 -
1 of
1
Performance evaluation of MPI implementations and MPI based parallel ELLPACK solvers
- In 2 nd MPI Developers Coneference
, 1996
"... In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite dif ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite difference and finite element techniques. For this we use the Parallel ELLPACK (//ELLPACK) problem solving environment (PSE) which supports PDE computations on several MIMD platforms. We have considered the ITPACK library of stationary iterative solvers which we have parallelized and integrated into the //ELLPACK PSE. This Parallel ITPACK package has been implemented using the MPI, PVM, PICL, PARMACS, nCUBE Vertex and Intel NX message passing communication libraries. It performs very efficiently on a variety of hardware and communication platforms. To study the efficiency of three MPI library implementations, the performance of the Parallel ITPACK solvers was measured on several distributed memory architectures and on clusters of workstations for a testbed of elliptic boundary value PDE problems. We present a comparison of these MPI library implementationswith PVM and the native communication libraries, based on their performance on these tests. Moreover we have implemented in MPI, a parallel mesh generator that concurrently produces a semi–optimal partitioning of the mesh to support various domain decomposition solution strategies across the above platforms. The results indicate that the MPI overhead varies among the various implementations without significantly affecting the algorithmic speedup even on clusters of workstations.

