Results 1 -
6 of
6
The Design and Implementation of a Region-Based Parallel Language
, 2001
"... This is to certify that I have examined this copy of a doctoral dissertation by ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
This is to certify that I have examined this copy of a doctoral dissertation by
An MPI Library which uses Polling, Interrupts and Remote Copying for the Fujitsu AP1000+
- In Proceedings of International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN’96). IEEE
, 1996
"... A complete implementation of MPI for the Fujitsu AP1000+ is presented. The library can employ a number of different mechanisms in implementing the send and receive message passing operations. The method of detecting the arrival of new messages can be realized through interrupt-driven and polling tec ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A complete implementation of MPI for the Fujitsu AP1000+ is presented. The library can employ a number of different mechanisms in implementing the send and receive message passing operations. The method of detecting the arrival of new messages can be realized through interrupt-driven and polling techniques. Transferring message data is achieved by either sending the message data directly to the receiver "in-place", or using a rendezvous method which allows the use of a fast noncopying nonblocking remote-fetching operation. The MPI library exhibits good performance compared to the native message passing library, and allows the user to decide at runtime which mechanisms will be used in order to achieve the best performance on a per-application basis. keywords: Message passing, MPI, Performance evaluation, Portable programming, NAS parallel benchmark 1 Introduction MPI (Message Passing Interface) is a standard for message passing operations, resulting from a collaborative process of a...
Performance Modeling and Evaluation of MPI
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 2001
"... Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of paral ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). Our objective is to identify a performance model suitable for MPI performance characterization and to compare the performance of MPI communications on several platforms.
The OCCOMM Benchmarking Guide Version 1.2
, 1996
"... The regular partitioning of grid based finite difference models for distribution onto parallel processors leads to a characteristic nearest neighbour boundary exchange communications pattern. In general the data structures to be exchanged are not contiguous in memory. OCCOMM is a low-level communica ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The regular partitioning of grid based finite difference models for distribution onto parallel processors leads to a characteristic nearest neighbour boundary exchange communications pattern. In general the data structures to be exchanged are not contiguous in memory. OCCOMM is a low-level communications kernel benchmark which determines the performance of various message passing techniques applied to contiguous, singlestrided and double-strided data structures, commonly found in ocean, and other regular grid based, models. The OCCOMM code is freely available and we encourage interested parties to run it on systems of their choice. This report is a guide to obtaining, building and running the OCCOMM code. 2 1 Quick Start Here is a minimal set of steps for downloading and running the OCCOMM benchmark program. More detailed instructions are given in the following sections. 1) Get occomm.tar or get and gunzip occomm.tar.gz from http://www.dkrz.de/dkrz/parallel/occomm/home-english.htm...
The CCLRC HPCI Centre at Daresbury Laboratory
, 1996
"... Parallel software packages which may be of use in scientific and engineering applications of the type carried out on the parallel computing facilities at EPCC and Daresbury Laboratory are surveyed. For each package, a brief description is given along with other useful information such as availabilit ..."
Abstract
- Add to MetaCart
Parallel software packages which may be of use in scientific and engineering applications of the type carried out on the parallel computing facilities at EPCC and Daresbury Laboratory are surveyed. For each package, a brief description is given along with other useful information such as availability, contact addresses and systems supported. keywords: parallel computing, software packages, scientific applications. This report is available from http://www.dl.ac.uk/TCSC/HPCI/ c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in any communication about our tests or investigations. ii CONTENTS iii Contents 1 Introduction 1 1.1 Criteria for inclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Package areas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.3 Individual entries : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :...
Communication Characterization of a Cray T3D
, 1997
"... In order to develop efficient applications for large scale multiprocessors, the communication patterns of the target architecture should be well understood. By understanding these communication patterns, a faster running application can be developed. Some applications require the smallest latency po ..."
Abstract
- Add to MetaCart
In order to develop efficient applications for large scale multiprocessors, the communication patterns of the target architecture should be well understood. By understanding these communication patterns, a faster running application can be developed. Some applications require the smallest latency possible, whereas others might benefit from higher throughput. The Cray T3D is a multiprocessor that has a high speed 3D torus interconnect, and special communications hardware. By utilizing this hardware properly, a program has the potential to minimize its communications overhead. This study investigates three different libraries that provide communication on the T3D: PVM, MPI, and SMA. Both PVM and MPI are portable, whereas SMA is native. The latency, bandwidth, and collective communication costs of the communication primitives implemented in these libraries are characterized and compared. These results are also briefly contrasted with the results of related studies on the IBM SP2 and the C...

