Results 1 -
8 of
8
Program transformation and runtime support for threaded MPI execution on shared-memory machines
- ACM Transactions on Programming Languages and Systems
, 2000
"... Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious per ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared memory machines. The proposed techniques allow MPI nodes to be executed safely and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provablycorrect communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI’s native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.
A multi-threaded Message Passing Interface (MPI) architecture: performance and program issues
"... This paper discusses a multi-threaded software architecture for Message-Passing Interface (MPI) software specification. The architecture is thread-safe, allows for concurrent communication over several communications media (multi-fabric communication), efficiently utilizes available hardware concurr ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper discusses a multi-threaded software architecture for Message-Passing Interface (MPI) software specification. The architecture is thread-safe, allows for concurrent communication over several communications media (multi-fabric communication), efficiently utilizes available hardware concurrency over a wide range of target platforms, and allows for concurrent communication and computation within the limits imposed by the hardware. The architecture is developed in the framework of the MPICH software architecture, a well-known MPI implementation used worldwide. The proposed architecture adopts wide portability of the MPICH design and remedies some of its deficiencies such as inefficient multifabric communication and non-thread-safety. The paper also considers the issues concerning development of high-performance portable message-passing systems for general-purpose architectures. The contributions of the paper are: improving architecture and addressing thread safety of modern reli...
Issues in developing a thread-safe mpi implementation
- In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI Users’ Group Meeting
, 2006
"... Abstract. The MPI-2 Standard has carefully specified the interaction between MPI and user-created threads, with the goal of enabling users to write multithreaded programs while also enabling MPI implementations to deliver high performance. In this paper, we describe and analyze what the MPI Standard ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. The MPI-2 Standard has carefully specified the interaction between MPI and user-created threads, with the goal of enabling users to write multithreaded programs while also enabling MPI implementations to deliver high performance. In this paper, we describe and analyze what the MPI Standard says about thread safety and what it implies for an implementation. We classify the MPI functions based on their thread-safety requirements and discuss several issues to consider when implementing thread safety in MPI. We use the example of generating new context ids (required for creating new communicators) to demonstrate how a simple solution for the single-threaded case cannot be used when there are multiple threads and how a naïve thread-safe algorithm can be expensive. We then present an algorithm for generating context ids that works efficiently in both single-threaded and multithreaded cases. 1
Thread safety in an MPI implementation: Requirements and analysis
- Parallel Computing
, 2007
"... The MPI-2 Standard has carefully specified the interaction between MPI and usercreated threads. The goal of this specification is to allow users to write multithreaded MPI programs while also allowing MPI implementations to deliver high performance. However, a simple reading of the thread-safety spe ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The MPI-2 Standard has carefully specified the interaction between MPI and usercreated threads. The goal of this specification is to allow users to write multithreaded MPI programs while also allowing MPI implementations to deliver high performance. However, a simple reading of the thread-safety specification does not reveal what its implications are for an implementation and what implementers must be aware (and careful) of. In this paper, we describe and analyze what the MPI Standard says about thread safety and what it implies for an implementation. We classify the MPI functions based on their thread-safety requirements and discuss several issues to consider when implementing thread safety in MPI. We use the example of generating new context ids (required for creating new communicators) to demonstrate how a simple solution for the single-threaded case does not naturally extend to the multithreaded case and how a naïve thread-safe algorithm can be expensive. We then present an algorithm for generating context ids that works efficiently in both single-threaded and multithreaded cases. Key words: Message Passing Interface (MPI), thread safety, MPI implementation, multithreaded programming 1
Toward Efficient Support for Multithreaded MPI Communication
, 2008
"... Abstract. To make the most effective use of parallel machines that are being built out of increasingly large multicore chips, researchers are exploring the use of programming models comprising a mixture of MPI and threads. Such hybrid models require efficient support from an MPI implementation for M ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. To make the most effective use of parallel machines that are being built out of increasingly large multicore chips, researchers are exploring the use of programming models comprising a mixture of MPI and threads. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We describe how we have structured our implementation to support all four approaches and enable one to be selected at build time. We present performance results with a message-rate benchmark to demonstrate the performance implications of the different approaches. 1
A Windows Nt Kernel-Mode Device Driver For Pci Myrinet Lanai 4.x Interface Adapters
, 1997
"... Device Interface (ADI) hides the specifics of the communication media. The ADI provides a 48 hardware independent set of calls to MPI, so the actual communication details remain hidden from the higher MPI layers. The Myrinet library is the library developed in this project and the driver is the Wind ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Device Interface (ADI) hides the specifics of the communication media. The ADI provides a 48 hardware independent set of calls to MPI, so the actual communication details remain hidden from the higher MPI layers. The Myrinet library is the library developed in this project and the driver is the Windows NT Memory Mapped Kernel Mode Device Driver for PCI Myrinet LANai 4.x Interface Adapter. In the Windows NT version of MPICH on BDM/Myrinet, the ADI-2 Myrinet device is implemented by means of the functions provided in the BDM library (Henley et al. 1996). In its turn, the BDM library calls the Myrinet library, developed in this project or communicates directly with the BDM MCP which is, again, possible through calls to the Myrinet library functions that perform mapping of the LANai adapter address space. An experimental run of the described system was held in the environment specified in Section 1.2 of this chapter. The results of the experiments for 8Kbyte packets were: latency 36s and ...
Concurrency, Multi-Threading, And Message Passing
, 1996
"... Device Interface) that defines a conceptual set of services which should be provided by any lower-level communication mechanism for the MPIR layer), and Device (the implementation of the ADI services for the particular platform) (Gropp et al. 1995). A part of the Device layer in the current MPICH im ..."
Abstract
- Add to MetaCart
Device Interface) that defines a conceptual set of services which should be provided by any lower-level communication mechanism for the MPIR layer), and Device (the implementation of the ADI services for the particular platform) (Gropp et al. 1995). A part of the Device layer in the current MPICH implementation is called the Channel Device Interface; this sub-layer is actually an interface between the ADI and lower-level device code; the services of the lower-level device layer are further conceptualized and reduced to a small set of basic message-passing routines that should be implemented for a target platform (Gropp and Lusk 1995b; Gropp and Lusk 1995a; Gropp and Lusk 1996a; Gropp and Lusk 1996b). All devices which use the Channel Device Interface are called "channel devices." One can choose any layer (MPIR, ADI, or Device) and implement everything below it in order to port MPICH to a target platform. The MPICH design and implementation is still being improved and corrected. User b...
Test Suite for Evaluating Performance of Multithreaded MPI Communication
"... As parallel systems are commonly being built out of increasingly large multicore chips, application programmers are exploring the use of hybrid programming models combining MPI across nodes and multithreading within a node. Many MPI implementations, however, are just starting to support multithreade ..."
Abstract
- Add to MetaCart
As parallel systems are commonly being built out of increasingly large multicore chips, application programmers are exploring the use of hybrid programming models combining MPI across nodes and multithreading within a node. Many MPI implementations, however, are just starting to support multithreaded MPI communication, often focussing on correctness first and performance later. As a result, both users and implementers need some measure for evaluating the multithreaded performance of an MPI implementation. In this paper, we propose a number of performance tests that are motivated by typical application scenarios. These tests cover the overhead of providing the MPI THREAD MULTIPLE level of thread safety for user programs, the amount of concurrency in different threads making MPI calls, the ability to overlap communication with computation, and other features. We present performance results with this test suite on several platforms (Linux cluster,

