Results 1 -
6 of
6
Monitors, Messages, and Clusters: the p4 Parallel Programming System
"... p4 is a portable library of C and Fortran subroutines for programming parallel computers. It is the current version of a system that has been in use since 1984. It includes features for explicit parallel programming of shared-memory machines, distributed-memory machines (including heterogeneous netw ..."
Abstract
-
Cited by 105 (10 self)
- Add to MetaCart
p4 is a portable library of C and Fortran subroutines for programming parallel computers. It is the current version of a system that has been in use since 1984. It includes features for explicit parallel programming of shared-memory machines, distributed-memory machines (including heterogeneous networks of workstations), and clusters, by which we mean sharedmemory multiprocessors communicating via message passing. We discuss here the design goals, history, and system architecture of p4 and describe briefly a diverse collection of applications that have demonstrated the utility of p4. 1 Introduction p4 is a library of routines designed to express a wide variety of parallel algorithms portably, efficiently and simply. The goal of portability requires it to use widely accepted models of computation rather than specific vendor implementations of those models. The goal of efficiency requires it to use models of computation relatively close to those provided by the machines themselves and t...
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
, 1993
"... In this paper, we present a new practical processor self-scheduling scheme, Trapezoid Self-Scheduling, for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically allocate loop iterations to process ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
In this paper, we present a new practical processor self-scheduling scheme, Trapezoid Self-Scheduling, for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically allocate loop iterations to processors, one may achieve load balancing among processors at the expense of run-time scheduling overhead. By linearly decreasing the chunk size at run time, the best tradeoff between the scheduling overhead and balanced workload can be obtained in the proposed trapezoid self-scheduling approach. Due to its simplicity and flexibility, this approach can be efficiently implemented in any parallel compilers. The small and predictable number of chores also allow efficient management of memory in a static fashion. Our experiments conducted in a 96-node Butterfly GP1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self-scheduling approaches. Keywords: Chunk size, Cr...
The ANL/GMD Macros (PARMACS) in FORTRAN for Portable Parallel Programming using the Message Passing Programming Model User's Guide and Reference Manual
- GMD, Postfach 1316, D-5205 Sankt Augustin 1
, 1991
"... A macro package for expressing message passing functions within parallel FORTRAN programs is presented. It makes the user program fully portable among all parallel computers where the macros are implemented. The definitions of all macros are included. A simple example program demonstrates the usa ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
A macro package for expressing message passing functions within parallel FORTRAN programs is presented. It makes the user program fully portable among all parallel computers where the macros are implemented. The definitions of all macros are included. A simple example program demonstrates the usage of the package. 1 Contents 1 Background and Purpose of the Macros 4 2 Using the Macros 6 3 Definition of the Basic Macros 7 3.1 Macro Specific Declarations : : : : : : : : : : : : : : : : : : : : : : : : 7 3.2 Process Initialization and Termination : : : : : : : : : : : : : : : : : : 7 3.3 Node Process Creation : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 3.4 Process Identification of Itself and the Host : : : : : : : : : : : : : : : : 9 3.5 Message Passing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.5.1 Asynchronous Communication (SEND and RECV) : : : : : : 10 3.5.2 Synchronous Communication (SENDR and RECVR) : : : : : 11 3.6 Conversion of Data Repres...
Design And Implementation Of PVM Version 3
, 1994
"... There is a growing trend toward distributed computing -- writing programs that run across multiple networked computers -- to speed up computation, solve larger problems or withstand machine failures. A programming model commonly used to write distributed applications is message-passing, in which a p ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
There is a growing trend toward distributed computing -- writing programs that run across multiple networked computers -- to speed up computation, solve larger problems or withstand machine failures. A programming model commonly used to write distributed applications is message-passing, in which a program is decomposed into distinct subprograms that communicate and synchronize with one another by explicitly sending and receiving blocks of data. PVM (Parallel Virtual Machine) is a generic message-passing system composed of a programming library and manager processes. It ties together separate physical machines (possibly of different types), providing communication and control between the subprograms and detection of machine failures. The resulting virtual machine appears as a single, manageable resource. PVM is portable to a wide variety of machine architectures and operating systems, including workstations, supercomputers, PCs and multiprocessors. In this paper I describe the design,...
Force User's Manual
, 1987
"... CONTENTS I. Introduction 1 II. Description of the Force Macros: 5 A. Macros Specifying Program Structure 6 B. Variable Declarations 10 C. Parallel Execution 13 D. Synchronization 20 III. Restrictions on the Force Macros 25 IV. How to Invoke the Force 25 A. Flex/32 (Flexible Computer Corp.) 26 B. Mul ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
CONTENTS I. Introduction 1 II. Description of the Force Macros: 5 A. Macros Specifying Program Structure 6 B. Variable Declarations 10 C. Parallel Execution 13 D. Synchronization 20 III. Restrictions on the Force Macros 25 IV. How to Invoke the Force 25 A. Flex/32 (Flexible Computer Corp.) 26 B. Multimax (Encore Computer Corp.) 27 C. Balance (Sequent Computer Corp.) 28 D. Alliant FX/Series (Alliant Computer Systems Corp.) 29 E. Cray 2 (Cray Research, Inc.) 29 F. Cray Y-MP (Cray Research, Inc.) 30 G. Convex C220 (Convex Computer, Inc.) 31 V. Sample Program Listing 31 VI. References I. Introduction The principle of global parallelism in parallel programming was introduced by Jordan [1], through a set of FORTRAN macros called the Force macros. These macros support the construction of programs to be executed in parallel by a "Force of processes." The number of processes is left unspecified at compile time, but is potentially quite large. The Force provides a FORTRAN-styl
A Study of Backoff Barrier Synchronization
, 1989
"... Shared-memory multiprocessors commonly use shared variables for synchronization. Simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant amounts of invalidation traffic due to synchronization. ..."
Abstract
- Add to MetaCart
Shared-memory multiprocessors commonly use shared variables for synchronization. Simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant amounts of invalidation traffic due to synchronization.

