Results 1 - 10
of
54
A high-performance, portable implementation of the MPI message passing interface standard
- Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract
-
Cited by 651 (37 self)
- Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum. 1
Monitors, Messages, and Clusters: the p4 Parallel Programming System
"... p4 is a portable library of C and Fortran subroutines for programming parallel computers. It is the current version of a system that has been in use since 1984. It includes features for explicit parallel programming of shared-memory machines, distributed-memory machines (including heterogeneous netw ..."
Abstract
-
Cited by 105 (10 self)
- Add to MetaCart
p4 is a portable library of C and Fortran subroutines for programming parallel computers. It is the current version of a system that has been in use since 1984. It includes features for explicit parallel programming of shared-memory machines, distributed-memory machines (including heterogeneous networks of workstations), and clusters, by which we mean sharedmemory multiprocessors communicating via message passing. We discuss here the design goals, history, and system architecture of p4 and describe briefly a diverse collection of applications that have demonstrated the utility of p4. 1 Introduction p4 is a library of routines designed to express a wide variety of parallel algorithms portably, efficiently and simply. The goal of portability requires it to use widely accepted models of computation rather than specific vendor implementations of those models. The goal of efficiency requires it to use models of computation relatively close to those provided by the machines themselves and t...
User's Guide for mpich, a Portable Implementation of MPI Version 1.2.1
, 1996
"... 1 1 Introduction 2 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 3 2.1.1 Fortran 90 and the MPI module . . . . . . . . . . . . . . . . . . . . 4 2.2 Compiling and Linking without the Scripts . . . . . . . . . . . . . . . . . . 4 2 ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
1 1 Introduction 2 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 3 2.1.1 Fortran 90 and the MPI module . . . . . . . . . . . . . . . . . . . . 4 2.2 Compiling and Linking without the Scripts . . . . . . . . . . . . . . . . . . 4 2.3 Running with mpirun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3.1 SMP Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3.2 Multiple Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 More detailed control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Special features of different systems 6 3.1 Workstation clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Checking your machines list . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Using the Secure Shell . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.3 Using the Secure Server . . . . . . . . . . . . . . . . ....
Parallel Execution of Prolog Programs: A Survey
"... Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of non-determinism, and their referential transparency, among other characteristics, make logic ..."
Abstract
-
Cited by 53 (23 self)
- Add to MetaCart
Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of non-determinism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this paper is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The paper describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory
MPI: A Message Passing Interface
, 1993
"... This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the mpi standard is based largely on current practice. 1 Introduction
Productive Parallel Programming: The PCN Approach
- Scientific Programming
, 1992
"... We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fort ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful. Keywords: PCN, program composition, parallel programming, reuse, templates. 1 Introduction After many years as academic curiosities, computers combining hundreds or thousands of powerful microprocessors have overtaken vector processors and become essential tools for scientists and...
Understanding the behavior and performance of non-blocking communications in MPI
- In Proceedings of Euro-Par 2004: Parallel Processing, LNCS 3149
, 2004
"... Abstract. The behavior and performance of MPI non-blocking message passing operations are sensitive to implementation specifics as they are heavily dependant on available system level buffers. In this paper we investigate the behavior of non-blocking communication primitives provided by popular MPI ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Abstract. The behavior and performance of MPI non-blocking message passing operations are sensitive to implementation specifics as they are heavily dependant on available system level buffers. In this paper we investigate the behavior of non-blocking communication primitives provided by popular MPI implementations and propose strategies for these primitives than can reduce processor synchronization overheads. We also demonstrate the improvements in the performance of a parallel Structured Adaptive Mesh Refinement (SAMR) application using these strategies. 1
TAU: A Portable Parallel Program Analysis Environment for pC++
, 1994
"... The realization of parallel language systems that offer high-level programming paradigms to reduce the complexity of application development, scalable runtime mechanisms to support variable size problem sets, and portable compiler platforms to provide access to multiple parallel architectures, place ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
The realization of parallel language systems that offer high-level programming paradigms to reduce the complexity of application development, scalable runtime mechanisms to support variable size problem sets, and portable compiler platforms to provide access to multiple parallel architectures, places additional demands on the tools for program development and analysis. The need for integration of these tools into a comprehensive programming environment is even more pronounced and will require more sophisticated use of the language system technology (i.e., compiler and runtime system). Furthermore, the environment requirements of high-level support for the programmer, large-scale applications, and portable access to diverse machines also apply to the program analysis tools. In this paper, we discuss ø (TAU, Tuning and Analysis Utilities), a first prototype for an integrated and portable program analysis environment for pC++ , a parallel object-oriented language system. ø is integrated w...
An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application
, 1996
"... We present the results of an experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional parallel application code. This application, developed by scientists at the University of Chicago, simulates the gravitational collapse of self-gravita ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
We present the results of an experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional parallel application code. This application, developed by scientists at the University of Chicago, simulates the gravitational collapse of self-gravitating gaseous clouds. It performs parallel I/O by using library routines that we developed and optimized separately for the SP and Paragon. The I/O routines perform two-phase I/O and use the parallel file systems PIOFS on the SP and PFS on the Paragon. We studied the I/O performance for two different sizes of the application. In the small case, we found that I/O was much faster on the SP. In the large case, open, close, and read operations were only slightly faster, and seeks were significantly faster, on the SP; whereas, writes were slightly faster on the Paragon. The communication required within our I/O routines was faster on the Paragon in both cases. The highest read bandwidth ...

