Results 1 -
4 of
4
Implementation and performance analysis of non-blocking collective operations for MPI
- SC07
, 2007
"... Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance lib ..."
Abstract
-
Cited by 21 (12 self)
- Add to MetaCart
Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance library for implementing non-blocking collective MPI communication operations. LibNBC provides non-blocking versions of all MPI collective operations, is layered on top of MPI-1, and is portable to nearly all parallel architectures. To measure the performance characteristics of our implementation, we also present a microbenchmark for measuring both latency and overlap of computation and communication. Experimental results demonstrate that the blocking performance of the collective operations in our library is comparable to that of collective operations in other highperformance MPI implementations. Our library introduces a very low overhead between the application and the underlying MPI and thus, in conjunction with the potential to overlap communication with computation, offers the potential for optimizing real-world applications.
A Case for Standard Non-Blocking Collective Operations
"... Abstract. In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The non-blocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simult ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Abstract. In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The non-blocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations in MPI. Microbenchmark and application-based performance results demonstrate that non-blocking collective operations offer not only improved convenience, but improved performance as well, when compared to manual use of threads with blocking collectives. 1
An approach to formalization and analysis of message passing libraries
- IN: PROCEEDINGS OF THE 12TH INTL. WORKSHOP ON FORMAL METHODS FOR INDUSTRIAL CRITICAL SYSTEMS (FMICS
, 2007
"... Message passing using libraries implementing the Message Passing Interface (MPI) standard is the dominant communication mechanism in high performance computing (HPC) applications. Yet, the lack of an implementation independent formal semantics for MPI is a huge void that must be filled, especially ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Message passing using libraries implementing the Message Passing Interface (MPI) standard is the dominant communication mechanism in high performance computing (HPC) applications. Yet, the lack of an implementation independent formal semantics for MPI is a huge void that must be filled, especially given the fact that MPI will be implemented on novel hardware platforms in the near future. To help reason about programs that use MPI for communication, we have developed a formal TLA+ semantic definition of the point to point communication operations to augment the existing standard. The proposed semantics includes 42 MPI functions, including all 35 point to point operations, many of which have not been formally modeled previously. We also present a framework to extract models from SPMD-style C programs, so that designers may understand the semantics of MPI by exercising short, yet pithy, communication scenarios written in C/MPI. In this paper, we describe (i) the TLA+ MPI model features, such as handling the explicit memory for each process to facilitate the modeling of C pointers, and some of the widely used MPI operations, (ii) the model extraction framework and the simplifications made to the model that help facilitate explicit-state model checking of formal semantic definitions, (iii) a customized model checker for MPI that performs much faster model checking, and features a dynamic partial-order reduction algorithm whose correctness is directly based on the formal semantics, and (iv) an error trail replay facility in the Visual Studio environment. Our effort has helped identify a few omissions in the MPI reference standard document. These benefits suggest that a formal semantic definition and exploration approach as described here must accompany every future effort in creating parallel and distributed programming libraries.
Test suite for evaluating performance of MPI implementations that support MPI THREAD MULTIPLE
, 2007
"... Abstract. MPI implementations that support the highest level of thread safety for user programs, MPI THREAD MULTIPLE, are becoming widely available. Users often expect that different threads can execute independently and that the MPI implementation can provide the necessary level of thread safety wi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. MPI implementations that support the highest level of thread safety for user programs, MPI THREAD MULTIPLE, are becoming widely available. Users often expect that different threads can execute independently and that the MPI implementation can provide the necessary level of thread safety with only a small overhead. The MPI Standard, however, requires only that no MPI call in one thread block MPI calls in other threads; it makes no performance guarantees. Therefore, some way of measuring an implementation’s performance is needed. In this paper, we propose a number of performance tests that are motivated by typical application scenarios. These tests cover the overhead of providing the MPI THREAD MULTIPLE level of thread safety for user programs, the amount of concurrency in different threads making MPI calls, the ability to overlap communication with computation, and other features. We present performance results with this test suite on several platforms

