• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Performance Evaluation of Some MPI Implementations on Workstation Clusters (0)

by Natawut Nupairoj, Lionel M Ni
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations

by Mohammad Banikazemi, Jayanthi Sampathkumar, Sandeep Prabhu, Dhabaleswar K. Panda - In HCW’99, the 8th Heterogeneous Computing Workshop , 1999
"... Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogene ..."
Abstract - Cited by 29 (0 self) - Add to MetaCart
Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW). Having an accurate model for the communication in HNOW systems is crucial for design and evaluation of efficient communication layers for such systems. In this paper we present a model for point-to-point communication in HNOW systems and show how it can be used for characterizing the performance of different collective communication operations. In particular, we show how the performance of broadcast, scatter, and gather operations can be modeled and analyzed. We also verify the accuracy of our proposed model by using an experimental HNOW testbed. Furthermore, it is shown how this model can be used for comparing the performance of different collective communication algorithms. We also show how the effect of heterogeneity on the performance of collective communication operations can be predicted. 1

P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems

by F. J. Seinstra, D. Koelma - IEEE Transactions on Parallel and Distributed Systems , 2002
"... One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing pr ..."
Abstract - Cited by 8 (7 self) - Add to MetaCart
One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often signi cantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper we introduce a new point-to-point communication model (called P-3PC, or the 'Parameterized model based on the Three Paths of Communication') that is speci cally designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing de nitions as well.

Reducing Communication by Honoring Multiple Alignments

by David A. Garza-salazar, Wim Böhm - In Proceedings of the 9th ACM International Conference on Supercomputing (ICS'95 , 1995
"... Data Decomposition involves the mapping of array elements to processors of a Distributed Memory Machine with the goal to obtain the best possible performance of a program by keeping communication costs low while exploiting parallelism. Data decomposition is typically divided into two subproblems: al ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Data Decomposition involves the mapping of array elements to processors of a Distributed Memory Machine with the goal to obtain the best possible performance of a program by keeping communication costs low while exploiting parallelism. Data decomposition is typically divided into two subproblems: alignment and partitioning. Alignment deals with the relative allocation of different arrays. Partitioning is concerned with the actual distribution of the array elements among processors. Conflicting alignments may cause communication. This paper presents a technique for reducing communication by honoring multiple alignments and applies this approach in a distributed memory implementation of the strict functional language Sisal. Multiple alignment leads to recomputation and replication of array elements, which is safe in a functional, and hence side effect free, setting. We present performance improvements of up to 80% for one dimensional arrays, and up to 50% for two dimensional arrays, comp...

Performance evaluation of MPI implementations and MPI based parallel ELLPACK solvers

by S. Markus, S. B. Kim, K. Pantazopoulos, A. L. Ocken, E. N. Houstis, P. Wu, S. Weerawarana, D. Maharry - In 2 nd MPI Developers Coneference , 1996
"... In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite dif ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite difference and finite element techniques. For this we use the Parallel ELLPACK (//ELLPACK) problem solving environment (PSE) which supports PDE computations on several MIMD platforms. We have considered the ITPACK library of stationary iterative solvers which we have parallelized and integrated into the //ELLPACK PSE. This Parallel ITPACK package has been implemented using the MPI, PVM, PICL, PARMACS, nCUBE Vertex and Intel NX message passing communication libraries. It performs very efficiently on a variety of hardware and communication platforms. To study the efficiency of three MPI library implementations, the performance of the Parallel ITPACK solvers was measured on several distributed memory architectures and on clusters of workstations for a testbed of elliptic boundary value PDE problems. We present a comparison of these MPI library implementationswith PVM and the native communication libraries, based on their performance on these tests. Moreover we have implemented in MPI, a parallel mesh generator that concurrently produces a semi–optimal partitioning of the mesh to support various domain decomposition solution strategies across the above platforms. The results indicate that the MPI overhead varies among the various implementations without significantly affecting the algorithmic speedup even on clusters of workstations.

Calculators and computers

by Aiichiro Nakano, Timothy Campbell - In R. Jensen, Early Childhood Mathematics, NCTM Research Interpretation Project , 1993
"... PARALLEL ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract not found

Performance of Parallel Communication and Spawning Primitives on a Linux Cluster

by D. J. Johnston, M. Fleury, M. Lincoln, A. C. Downton
"... The Linux cluster considered in this paper, formed from shuttle box XPC nodes with 2 GHz Athlon processors connected by dual Gb Ethernet switches, is relatively easily con-structed, but, while effective as a throughput engine, may result in disappointing results when running explicitly parallel soft ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The Linux cluster considered in this paper, formed from shuttle box XPC nodes with 2 GHz Athlon processors connected by dual Gb Ethernet switches, is relatively easily con-structed, but, while effective as a throughput engine, may result in disappointing results when running explicitly parallel software if weakly-performing communication mechanisms and process spawning are selected. This paper carefully compares the implementations of communication and spawning primitives in MPICH-2, openMosix, and Linux Remote Pro-cedure Call, forking, and various lower-level communication mechanisms. The test selection compares the provision of both a message-passing library, and a single system image software package, with direct use of lower-level primitives. The information in the paper will be of in-terest to those considering the use of one of the well-known packages, or directly writing their own distributed applications, or constructing a distributed language by layering on top of an existing set of parallel primitives. The results expose a ranking in terms of process spawning and a similar ranking of communication software performance. They reveal poor performance in certain circumstances, well below the hardware specification, which it is as well that the developer is aware of. In general, the paper emphasizes the importance of efficient transport software to cluster machines. 1

Experiencing the Message Passing Interface standard and Parallel Image Processing

by E. M. P. A. Mahieu, Afdeling Experimentele , 1996
"... this article some features of load balancing algorithms are described: process checkpoint and restart mechanisms. The parallel SOIM algorithm uses all computing power for execution, which means the introduction of load balancing would have too much influence on the results. Also the communication co ..."
Abstract - Add to MetaCart
this article some features of load balancing algorithms are described: process checkpoint and restart mechanisms. The parallel SOIM algorithm uses all computing power for execution, which means the introduction of load balancing would have too much influence on the results. Also the communication cost of the algorithm is too expensive for load balancing. Although integration of load balancing might be more efficient when the following constraints are fulfilled:

Validation of an Indirect Network Module for PROTEUS Robert Bennett and Mike Beynon CMSC 818 I/J Project Report

by Po Rt, Robert Bennett, Mike Beynon
"... Introduction and Motivations Simulation plays an important role in performance evaluation of parallel systems (hardware and software), algorithms and applications. There are various reasons for using a simulator instead of real machines. Rapid prototyping of a cache coherency protocol could be faci ..."
Abstract - Add to MetaCart
Introduction and Motivations Simulation plays an important role in performance evaluation of parallel systems (hardware and software), algorithms and applications. There are various reasons for using a simulator instead of real machines. Rapid prototyping of a cache coherency protocol could be facilitated by a simulator that supports shared memory. Such a task could be difficult if the right hardware and software resources are not available. Likewise, implementation of a novel parallel file system can potentially involve hacking low level code that one might not have access to. Simulators are not just limited to systems. One of the motivating factors for this project was to provide an environment in which detailed (fine-grained) timings and analysis can be done without uncontrollable pertabations on real machines. Initially we intended to simulate both the network and the disk subsystem of the IBM SP-2 [5] to do performance studies of a runtime library. In the end, we settled

A Network Processor Based Message Manager for MPI

by Chamath Indika Keppitiyagama , 1997
"... We have implemented a system called MPI-NP II, vhich is an MPI specific messaging system for the Myrinet System Area Netvorks (SAN). It consists of a lov- level message manager executing on the LANai processor of the Myrinet Netsyork Interface Card (NIC), a thin host interface layer, and LAM-MPI, a ..."
Abstract - Add to MetaCart
We have implemented a system called MPI-NP II, vhich is an MPI specific messaging system for the Myrinet System Area Netvorks (SAN). It consists of a lov- level message manager executing on the LANai processor of the Myrinet Netsyork Interface Card (NIC), a thin host interface layer, and LAM-MPI, a public domain version of MPI.

Object-Oriented Parallel Programming with Objective Linda

by Bernd Freisleben, Thilo Kielmann
"... In this paper we present Objective Linda, a coordination model in which objectorientation is combined with uncoupled, generative communication in order to enable object-oriented parallel programming in networked computing resources. Objective Linda provides suitable abstractions for structuring lar ..."
Abstract - Add to MetaCart
In this paper we present Objective Linda, a coordination model in which objectorientation is combined with uncoupled, generative communication in order to enable object-oriented parallel programming in networked computing resources. Objective Linda provides suitable abstractions for structuring large software systems, supports interoperability between different programming languages and parallel architectures and simplifies the development of parallel applications. Its use is illustrated by presenting programming examples, a prototype implementation is described, and measurements for evaluating the implementation efficiency and the performance of parallel applications are presented.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University