Results 1 - 10
of
50
MPI: A Message-Passing Interface Standard
, 1994
"... process naming to allow libraries to describe their communication in terms suitable to their own data structures and algorithms, ffl The ability to "adorn" a set of communicating processes with additional user-defined attributes, such as extra collective operations. This mechanism should provide a ..."
Abstract
-
Cited by 250 (0 self)
- Add to MetaCart
process naming to allow libraries to describe their communication in terms suitable to their own data structures and algorithms, ffl The ability to "adorn" a set of communicating processes with additional user-defined attributes, such as extra collective operations. This mechanism should provide a means for the user or library writer effectively to extend a message-passing notation. In addition, a unified mechanism or object is needed for conveniently denoting communication context, the group of communicating processes, to house abstract process naming, and to store adornments. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 5.1. INTRODUCTION 131 5.1.2 MPI's Support for Libraries The corresponding concepts that MPI provides, specifically to support robust libraries, are as follows: ffl Contexts of communication, ffl Groups of processes, ffl Virtual topologies, ffl Attribute caching, ffl Commun...
Fortran M: A Language for Modular Parallel Programming
- Journal of Parallel and Distributed Computing
, 1992
"... Fortran M is a small set of extensions to Fortran 77 that supports a modular approach to the design of message-passing programs. It has the following features. (1) Modularity. Programs are constructed by using explicitly-declared communication channels to plug together program modules called process ..."
Abstract
-
Cited by 131 (24 self)
- Add to MetaCart
Fortran M is a small set of extensions to Fortran 77 that supports a modular approach to the design of message-passing programs. It has the following features. (1) Modularity. Programs are constructed by using explicitly-declared communication channels to plug together program modules called processes. A process can encapsulate common data, subprocesses, and internal communication. (2) Safety. Operations on channels are restricted so as to guarantee deterministic execution, even in dynamic computations that create and delete processes and channels. Channels are typed, so a compiler can check for correct usage. (3) Architecture Independence. The mapping of processes to processors can be specified with respect to a virtual computer with size and shape different from that of the target computer. Mapping is specified by annotations that influence performance but not correctness. (4) Efficiency. Fortran M can be compiled efficiently for uniprocessors, sharedmemory computers, distributed-m...
The Design Of A Standard Message Passing Interface For Distributed Memory Concurrent Computers
- Parallel Computing
, 1994
"... This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the mpi standard is based largely on current practice.
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
- IEEE Transactions on Parallel and Distributed Systems
, 1995
"... Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the a ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel amputer products by IBM, has been designed. CCL is pact of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model. Index Terms- Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms. I.
Efficient Algorithms for All-to-All Communications in Multi-Port Message-Passing Systems
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... Abstract—We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-toall personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected messagepassing system, in which the performance of any ..."
Abstract
-
Cited by 60 (0 self)
- Add to MetaCart
Abstract—We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-toall personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected messagepassing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth. In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the i th block of processor j with the j th block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system. In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred. Index Terms—All-to-all broadcast, all-to-all personalized communication, complete exchange, concatenation operation, distributedmemory system, index operation, message-passing system, multiscatter/gather, parallel system.
MPI: A Message Passing Interface
, 1993
"... This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the mpi standard is based largely on current practice. 1 Introduction
The Multicomputer Toolbox Approach to Concurrent BLAS
- Proc. Scalable High Performance Computing Conf. (SHPCC
, 1993
"... Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on general P \Theta Q logical process grids are presented, and experiments run demonstrating their performance characteristics. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy. Work performed under the auspices of the U. S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48. Submitted to the Concurrency: Practice & Experience. y Address correspondence to: Mississippi State University, Engineering Research Center, PO Box 6176, Mississippi State, MS 39762. 601-325-8435. tony@cs.msstate.edu. Falgout, Skjellum, Smith & Still --- The Multicomputer Toolbo...
The Design and Evolution of Zipcode
- Parallel Computing
, 1994
"... Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and iden ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and identified needs. Features of Zipcode that were originally unique to it, were its simultaneous support of static process groups, communication contexts, and virtual topologies, forming the "mailer" data structure. Point-to-point and collective operations reference the underlying group, and use contexts to avoid mixing up messages. Recently, we have added "gather-send" and "receive-scatter" semantics, based on persistent Zipcode "invoices," both as a means to simplify message passing, and as a means to reveal more potential runtime optimizations. Key features in Zipcode appear in the forthcoming MPI standard. Keywords: Static Process Groups, Contexts, Virtual Topologies, Point-to-Point Communica...
The Multicomputer Toolbox: Scalable Parallel Libraries for Large-Scale Concurrent Applications
, 1994
"... In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
In this paper, we consider what is required to develop parallel algorithms for engineering applications on message-passing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was accomplished with attention to portability, high performance and reusability of the underlying algorithms. Emerging from this work are several key results: first, a methodology for explicit parallelization of algorithms and for the evaluation of parallel algorithms in the distributed-memory context; second, a set of portable, reusable numerical algorithms constituting a "Multicomputer Toolbox," suitable for use on both existing and future medium-grain concurrent computers; third, a working prototype simulation system, Cdyn, for distillation problems, that can be enhanced (with additional work) to address more complex flowsheeting problems in chemical engineering; fourth, ideas for how to a...
Document for a Standard Message-Passing Interface
, 1997
"... Introduction Current Status: No votes Collective communication capabilities are here for MPI-2, covering these areas: ffl Extension of MPI collective operations to intercommunicators ffl Extension of MPI collective operations to in-place buffers. ffl Two-phase collective communication of a limi ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Introduction Current Status: No votes Collective communication capabilities are here for MPI-2, covering these areas: ffl Extension of MPI collective operations to intercommunicators ffl Extension of MPI collective operations to in-place buffers. ffl Two-phase collective communication of a limited form and a limited set of operations. ffl A generalized all-to-all collective operation 6.2 Two-phase Collective Communication Current Status: no votes In some applications, better performance can be achieved by separating the initiation and completion of a collective operation. For example, in some numerical applications, better performance can be achieved by overlapping other work (both computation and communication) with an MPI Allreduce. At the same time, the full generality of non-blocking collectiv

