Results 1 - 10
of
11
A high-performance, portable implementation of the MPI message passing interface standard
- Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract
-
Cited by 651 (37 self)
- Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum. 1
User's Guide for mpich, a Portable Implementation of MPI Version 1.2.1
, 1996
"... 1 1 Introduction 2 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 3 2.1.1 Fortran 90 and the MPI module . . . . . . . . . . . . . . . . . . . . 4 2.2 Compiling and Linking without the Scripts . . . . . . . . . . . . . . . . . . 4 2 ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
1 1 Introduction 2 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 3 2.1.1 Fortran 90 and the MPI module . . . . . . . . . . . . . . . . . . . . 4 2.2 Compiling and Linking without the Scripts . . . . . . . . . . . . . . . . . . 4 2.3 Running with mpirun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3.1 SMP Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3.2 Multiple Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 More detailed control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Special features of different systems 6 3.1 Workstation clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Checking your machines list . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Using the Secure Shell . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.3 Using the Secure Server . . . . . . . . . . . . . . . . ....
Sowing MPICH: A Case Study in the Dissemination of a Portable Environment for Parallel Scientific Computing
- IJSA
, 1996
"... MPICH is an implementation of the MPI specification for a standard message-passing library interface. In this article we focus on the lessons learned from preparing MPICH for diverse parallel computing environments. These lessons include how to prepare software for configuration in unknown environme ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
MPICH is an implementation of the MPI specification for a standard message-passing library interface. In this article we focus on the lessons learned from preparing MPICH for diverse parallel computing environments. These lessons include how to prepare software for configuration in unknown environments; how to structure software to absorb contributions by others; how to automate the preparation of man pages, Web pages, and other documentation; how to automate prerelease testing for both correctness and performance; and how to manage the inevitable problem reports with a minimum of resources for support.
Scalable compression and replay of communication traces in massively parallel environments
- In IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS
, 2007
"... Mueller). Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code and system complexity as well as the time to execute such codes. An alternative to running actual codes is to gather their communication traces and then replay them, which facil ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Mueller). Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code and system complexity as well as the time to execute such codes. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides near constant-size communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events and present results of our implementation. Given this novel capability, we discuss its impact on communication tuning and beyond.
Performance Analysis of PC-CLUMP based on SMP-Bus Utilization
"... PC-CLUMP (Cluster of Multiprocessor) is one of the most cost-effective commodity-based platforms for HPC applications. The increasing number of CPUs per SMP node realizes very compact system size and very low price on the network interface per processor keeping the number of CPUs in the system. Howe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
PC-CLUMP (Cluster of Multiprocessor) is one of the most cost-effective commodity-based platforms for HPC applications. The increasing number of CPUs per SMP node realizes very compact system size and very low price on the network interface per processor keeping the number of CPUs in the system. However, the performance of SMP-bus on such an SMPPC node is relatively poor compared with that of SMP-Workstations. Therefore, the application program must achieve a high cache-hit ratio to reduce SMP-bus access for higher performance. In this study, we analyze the performance of PCCLUMP based on the SMP-bus access ratio on several benchmarks as representatives of HPC applications. We measure the ratio of SMP-bus access per total memory access instructions utilizing the performance counters equipped with modern CPUs. From the result of analysis, we can estimate the best numbers of CPUs/node and nodes/system. Moreover, we also confirm the combination of parallel descriptions with OpenMP and MPI is very suitable for programming on PC-CLUMP while there exist several problems on data-process mapping which are related to cache-hit ratio of the applications. 1
A Performance and Portability Study of Parallel Applications Using a Distributed Computing Testbed
"... A case study was conducted to examine the performance and portability of parallel applications, with an emphasis on data transfer among the processors in heterogeneous environments. Several parallel test programs using MPICH, a Message Passing Interface (MPI) library, and the Linda parallel environm ..."
Abstract
- Add to MetaCart
A case study was conducted to examine the performance and portability of parallel applications, with an emphasis on data transfer among the processors in heterogeneous environments. Several parallel test programs using MPICH, a Message Passing Interface (MPI) library, and the Linda parallel environment were developed to analyze communication performance and portability. These programs implement loosely and tightly synchronized communication models in which each processor exchanges data with two other processors. This data-exchange pattern mimics communication in certain parallel applications using striped partitioning of the computational domain. Tests were performed on an isolated, distributed computing testbed, a live development network, and a symmetrical multi-processing computer system. All network configurations used asynchronous transfer mode (ATM) network technologies. The testbed used in the study was a heterogeneous network consisting of various workstations and networking eq...
User's Guide for
"... 1 1 Introduction 1 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 2 2.2 Running with mpirun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 More detailed control . . . . . . . . . . . . . . . . . . . . . . . . . . ..."
Abstract
- Add to MetaCart
1 1 Introduction 1 2 Linking and running programs 2 2.1 Scripts to Compile and Link Applications . . . . . . . . . . . . . . . . . . . 2 2.2 Running with mpirun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 More detailed control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Special features of dierent systems 4 3.1 Workstation clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.1 Checking your machines list . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.2 Using the Secure Shell . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.3 Using the Secure Server . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.4 Heterogeneous networks and the ch p4 device . . . . . . . . . . . . . 6 3.1.5 Using special switches . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Computational grids: the globus device . . . . . . . . . . . . . . . . . . . . 8 3.3 MPPs . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
Message Passing Interface (MPI)
"... Device Interface (ADI), which provides a small yet efficient interface to specific hardware and software of a particular manufacturer. Computer vendors need only to implement particular interface on their system and still get the benefits of peak efficiency. As a software, MPICH promotes the adoptio ..."
Abstract
- Add to MetaCart
Device Interface (ADI), which provides a small yet efficient interface to specific hardware and software of a particular manufacturer. Computer vendors need only to implement particular interface on their system and still get the benefits of peak efficiency. As a software, MPICH promotes the adoption of the MPI Standard by providing a starting point for vendor proprietary implementations. The portabilityof MPICH means that it can be used on all of current parallel environments, including parallel computers and clusters of workstations. It narrows the performance gap between user level (MPI) programs and the capabilities of the hardware. MPICH currently runs both in a native mode and over TCP/IP sockets: ffl IBM SPx ffl Intel Paragon ffl Cray T3D, Cray YMP, Cray PVP, Cray C90 ffl Meiko CS2 ffl TMC CM5 ffl NCUBE NCUBE-2 ffl Convex Exemplar ffl SGI (Power) Challenge ffl Sun Multiprocessors ffl 486's running Linux or FreeBSD ffl Sun, HP, SGI, IBM (RS/6000) and DEC (Alpha) works...
Performance Evaluation of a . . .
, 2000
"... In this paper we review our experiences while testing a shared memory machine consisting of two Intel Pentium III processors. This includes installation of the Linux operating system and the mpich message passing library, performance tests for some numerical algorithms and some general remarks. ..."
Abstract
- Add to MetaCart
In this paper we review our experiences while testing a shared memory machine consisting of two Intel Pentium III processors. This includes installation of the Linux operating system and the mpich message passing library, performance tests for some numerical algorithms and some general remarks.
Vshmem: Shared-Memory OS-Support for Multicore-based HPC systems
"... As a result of the huge performance potential of multi-core microprocessors, HPC infrastructures are rapidly integrating them into their architectures in order to expedite the performance growth of the next generation HPC systems. However, as the number of cores per processor increase to 100 or 1000 ..."
Abstract
- Add to MetaCart
As a result of the huge performance potential of multi-core microprocessors, HPC infrastructures are rapidly integrating them into their architectures in order to expedite the performance growth of the next generation HPC systems. However, as the number of cores per processor increase to 100 or 1000s of cores, they are posing revolutionary challenges to the various aspects of the software stack. In our research, we endeavor to investigate novel solutions to the problem of extracting high-performance. In this paper, we advocate for the use of virtualization as an alternative approach to the traditional operating systems for the next generation multicore-based HPC systems. In particular, we investigate an efficient mechanism for shared-memory communication between HPC applications executing within virtual machine (VM) instances that are co-located on the same hardware platform. This system, called Vshmem, implements low latency IPC communication mechanism that allows the programmer to selectively share memory regions between user-space processes residing in collocated virtual machines. Our contributions addressed I.

