Results 1 - 10
of
15
The Design Of A Standard Message Passing Interface For Distributed Memory Concurrent Computers
- Parallel Computing
, 1994
"... This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the mpi standard is based largely on current practice.
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
- IEEE Transactions on Parallel and Distributed Systems
, 1995
"... Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the a ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel amputer products by IBM, has been designed. CCL is pact of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model. Index Terms- Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms. I.
Efficient Algorithms for All-to-All Communications in Multi-Port Message-Passing Systems
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... Abstract—We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-toall personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected messagepassing system, in which the performance of any ..."
Abstract
-
Cited by 60 (0 self)
- Add to MetaCart
Abstract—We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-toall personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected messagepassing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth. In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the i th block of processor j with the j th block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system. In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred. Index Terms—All-to-all broadcast, all-to-all personalized communication, complete exchange, concatenation operation, distributedmemory system, index operation, message-passing system, multiscatter/gather, parallel system.
MPI: A Message Passing Interface
, 1993
"... This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the mpi standard is based largely on current practice. 1 Introduction
The Design and Evolution of Zipcode
- Parallel Computing
, 1994
"... Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and iden ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and identified needs. Features of Zipcode that were originally unique to it, were its simultaneous support of static process groups, communication contexts, and virtual topologies, forming the "mailer" data structure. Point-to-point and collective operations reference the underlying group, and use contexts to avoid mixing up messages. Recently, we have added "gather-send" and "receive-scatter" semantics, based on persistent Zipcode "invoices," both as a means to simplify message passing, and as a means to reveal more potential runtime optimizations. Key features in Zipcode appear in the forthcoming MPI standard. Keywords: Static Process Groups, Contexts, Virtual Topologies, Point-to-Point Communica...
The IBM External User Interface for Scalable Parallel Systems
- Parallel Computing
, 1994
"... The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and dev ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and development of the EUI. 1 Introduction The IBM External User Interface (EUI) for scalable parallel systems is an application programming interface that was designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM Scalable POWERparallel System 9076 SP1, was announced in February 1993. The design of the EUI is aimed at providing a scalable and efficient parallel programming environment over a wide range of parallel products from IBM. The EUI is a library of coordination and communication routines that can be invoked from within FORTRAN or C application programs. Over the past several years, a large number of programming environments and communication l...
The Semantics of Blocking and Nonblocking Send and Receive Primitives
- Proceedings of 8th International parallel processing symposium (IPPS
, 1994
"... Current message-passing parallel computers provide send and receive primitives with a wide variety of blocking, synchronization, selectivity and ordering properties. Unfortunately, the interactions between the different properties of the send and receive primitives can be extremely complex, and as a ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Current message-passing parallel computers provide send and receive primitives with a wide variety of blocking, synchronization, selectivity and ordering properties. Unfortunately, the interactions between the different properties of the send and receive primitives can be extremely complex, and as a result, the precise semantics of these primitives are not well understood. In this paper we present formal models for message-passing systems that provide both synchronous and asynchronous sends, both blocking and nonblocking sends and receives, and a variety of ordering properties. In addition, the receive primitives are very general in that they can specify the desired source and/or tag value of a message. Our models apply to all message-passing programs, including ones with errors, and they apply to parallel computers with arbitrary amounts of buffering. To the best of our knowledge, this is the first time that such rich message-passing models have been defined formally. In addition to p...
The MPI Message Passing Interface Standard
"... The diverse message passing interfaces provided on parallel and distributed computing systems have caused difficulty in movement of application software from one system to another and have inhibited the commercial development of tools and libraries for these systems. The Message Passing Interface (M ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The diverse message passing interfaces provided on parallel and distributed computing systems have caused difficulty in movement of application software from one system to another and have inhibited the commercial development of tools and libraries for these systems. The Message Passing Interface (MPI) Forum has developed a de facto interface standard which was finalised in Q1 of 1994. Major parallel system vendors and software developers were involved in the definition process, and the first implementations of MPI are already appearing. This article presents an overview of the MPI initiative and the standard interface, in particular those aspects which merge demonstrated research with common practice. 1 Introduction The message passing paradigm is the most generally applicable and efficient programming model for parallel machines with distributed memory and has been used widely in parallel and distributed computing systems for some years. The development of parallel computing has bee...
On-the-Fly Topological Sort - A Basis for Interactive Debugging and Live Visualization of Parallel Programs
- In PADD ’93: Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
, 1993
"... This paper presents an optimal technique for on-the-fly ordering and matching of event data records that are being produced by a number of distinct processors. This is essential for effective interactive debugging and live visualization of parallel programs. The technique involves on-the-fly constru ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents an optimal technique for on-the-fly ordering and matching of event data records that are being produced by a number of distinct processors. This is essential for effective interactive debugging and live visualization of parallel programs. The technique involves on-the-fly construction of the causality graph of the execution of the program. A sliding window over the graph is maintained by discarding portions of the graph as soon as they are no longer required for ensuring correct order of subsequent program events. The sort places an event record into the causality graph when it is received, places it into the output stream as soon as possible---as soon as all of its predecessors in the causal order have been placed into the output, and discards the event record as soon as possible---as soon as all of its successors in the causal order notice that it has been output. This technique is optimal in terms of the amount of space required for the sort, and in terms of the ...
Repeatable and Portable Message-Passing Programs
- In Proc. of The Symposium on the Principles of Distributed Computing (PODC
, 1994
"... A fundamental issue in the use of message-passing systems is the creation of repeatable and portable programs. Repeatable program behavior is critical for debugging message-passing programs, while portability is essential for efficient software development. This paper makes two main contributions. F ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A fundamental issue in the use of message-passing systems is the creation of repeatable and portable programs. Repeatable program behavior is critical for debugging message-passing programs, while portability is essential for efficient software development. This paper makes two main contributions. First, it defines a set of program executions (called safe executions) that are guaranteed to be repeatable and portable. Safe program executions are defined for applications that utilize both blocking and nonblocking send and receive primitives, synchronous and asynchronous sends, and receives that select on the basis of source and/or tag values. To the best of our knowledge, this is the first time that conditions for repeatable and portable executions have been created for such rich message-passing models. Second, this paper gives precise characterizations of safe executions. The safety of an execution is shown to depend on the message-ordering properties of the underlying communication sys...

