Results 1 -
5 of
5
PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems ⋆
"... Abstract. Parallel programming models on large-scale systems require a scalable system for managing the processes that make up the execution of a parallel program. The process-management system must be able to launch millions of processes quickly when starting a parallel program and must provide mec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Parallel programming models on large-scale systems require a scalable system for managing the processes that make up the execution of a parallel program. The process-management system must be able to launch millions of processes quickly when starting a parallel program and must provide mechanisms for the processes to exchange the information needed to enable them communicate with each other. MPICH2 and its derivatives achieve this functionality through a carefully defined interface, called PMI, that allows different process managers to interact with the MPI library in a standardized way. In this paper, we describe the features and capabilities of PMI. We describe both PMI-1, the current generation of PMI used in MPICH2 and all its derivatives, as well as PMI-2, the second-generation of PMI that eliminates various shortcomings in PMI-1. Together with the interface itself, we also describe a reference implementation for both PMI-1 and PMI-2 in a new processmanagement framework within MPICH2, called Hydra, and compare their performance in running MPI jobs with thousands of processes. 1
On scalability for mpi runtime systems
- In: IEEE International Conference on Cluster Computing
"... Abstract—The future of high performance computing, as being currently foretold, will gravitate toward hundreds of thousands to million node machines, harnessing the computing power of billions of cores. While the hardware part is well covered, the software infrastructure at that scale is vague. Howe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—The future of high performance computing, as being currently foretold, will gravitate toward hundreds of thousands to million node machines, harnessing the computing power of billions of cores. While the hardware part is well covered, the software infrastructure at that scale is vague. However, no matter what the infrastructure will be, efficiently running parallel applications on such large machines will require optimized runtime environments that are scalable and resilient. More particularly, considering a future where Message Passing Interface (MPI) remains a major programming paradigm, the MPI implementations will have to seamlessly adapt to launching and managing large scale applications on resources several levels of magnitude larger than today. In this paper, we present a modified version of the Open MPI runtime that has been adapted towards a scalability goal. We evaluate the performance and compare it with two widely used runtime systems: the default version of Open MPI and MPICH2; using various underlying launching systems. The performance evaluation demonstrates a significant improvement over the state of the art. We also discuss the basic requirements for an exascale-ready parallel runtime. I.
RDMA-Based Job Migration Framework for MPI over InfiniBand
"... Abstract—Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. However, this kind of approach is unable to provide the scalability required by increasingly largesized jobs, since it puts heavy I/O burden on the storage subsystem, and resubmitting a job during restart phase incurs lengthy queuing delay. In this paper, we enhance the fault tolerance of MVA-PICH2 [1], an open-source high performance MPI-2 implementation, by using a proactive job migration scheme. Instead of checkpointing all the processes of the job and saving their process images to a stable storage, we transfer the processes running on a health-deteriorating node to a healthy spare node, and resume these processes from the spare node. RDMA-based process image transmission is designed to take advantage of high performance communication in InfiniBand. Experimental results show that the Job Migration scheme can achieve a speedup of 4.49 times over the Checkpoint/Restart scheme to handle a node failure for a 64-process application running on 8 compute nodes. To the best of our knowledge, this is the first such job migration design for InfiniBand-based clusters. I.
Designing and Evaluating MPI-2 Dynamic Process Management Support for
"... Dynamic process management is a feature of MPI-2 that allows an MPI process to create new processes and manage communication with these processes. The dynamic creation of processes allows application writers to develop multiscale applications or master/worker based programs. Although several MPI imp ..."
Abstract
- Add to MetaCart
Dynamic process management is a feature of MPI-2 that allows an MPI process to create new processes and manage communication with these processes. The dynamic creation of processes allows application writers to develop multiscale applications or master/worker based programs. Although several MPI implementations support this feature we are not aware of any studies on the issues in designing the dynamic process management interface and benchmarking of dynamic process framework. In this paper we design a MPI-2 dynamic process management interface over InfiniBand. We consider two alternative designs using Unreliable Datagram (UD) and Reliable Connection (RC) transport modes of InfiniBand with two job startup models. In our evaluations we found that having an UD baseddesign allows for much higher spawns rates with existing job launch frameworks. We also design a set of microbenchmarks to evaluate the performance of our design and other MPI libraries. Finally, we provide an evaluation of the dynamic process framework using a re-designed ray-tracing application. Keywords: MPI-2, Dynamic Process Management, InfiniBand I.
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure
"... The runtime environment of MPI implementations plays a key role to launch the application, to provide out-of-band communications, enabling I/O forwarding and bootstrapping of the connections of high-speed networks, and to control the correct termination of the parallel application. In order to enabl ..."
Abstract
- Add to MetaCart
The runtime environment of MPI implementations plays a key role to launch the application, to provide out-of-band communications, enabling I/O forwarding and bootstrapping of the connections of high-speed networks, and to control the correct termination of the parallel application. In order to enable all these

