Results 1 -
8 of
8
MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks
, 2000
"... The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (Open ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient. 1 Introduction Some primary supercomput...
Assessing Performance of Hybrid MPI/OpenMP Programs on SMP Clusters
, 2001
"... Computational experiences with hybrid message passing and multithreading techniques on SMP clusters generally show poorer performance than pure message passing approaches. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Computational experiences with hybrid message passing and multithreading techniques on SMP clusters generally show poorer performance than pure message passing approaches.
BSP Algorithms Design for Hierarchical Supercomputers. submitted for publication
, 2002
"... Abstract In recent years there has been a trend towards using standard workstation components to construct parallel computers, due to the enormous costs involved in designing and manufacturing special-purpose hardware. In particular, we can expect to see a large population of SMP clusters emerging i ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract In recent years there has been a trend towards using standard workstation components to construct parallel computers, due to the enormous costs involved in designing and manufacturing special-purpose hardware. In particular, we can expect to see a large population of SMP clusters emerging in the next few years. These are local-area networks of workstations, each containing around four parallel processors with a single shared memory. To use such machines effectively will be a major headache for programmers and compiler-writers. Here we consider how well-suited the BSP model might be for these two-tier architectures, and whether it would be useful to extend the model to allow for non-uniform communication behaviour.
Integrating MPI and the Nanothreads Programming Model
- In Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing (PDP 2002), Las Palmas
, 2002
"... This paper presents a prototype runtime system that integrates MPI, used on distributed memory systems, and Nanothreads Programming Model (NPM), a programming model for shared memory multiprocessors. This integration does not alter the independence of the two models, since the runtime system is base ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a prototype runtime system that integrates MPI, used on distributed memory systems, and Nanothreads Programming Model (NPM), a programming model for shared memory multiprocessors. This integration does not alter the independence of the two models, since the runtime system is based on a multilevel design that supports each of them individually but offers the capability to combine their advantages. Existing MPI codes can be executed without any changes, codes for shared memory machines can be used directly, while the concurrent use of both models is easy. Major feature of the runtime system is portability, as it is based exclusively on calls to MPI and Nthlib, a user-level threads library that has been ported to several operating systems. The runtime system supports the hybridprogramming model (MPI+OpenMP), providing also a solution for better load balancing in MPI applications. Moreover, it extends the API and the multiprogramming functionality of the NPM on clusters of multiprocessors and can support an extension of the OpenMP standard on distributed memory multiprocessors.
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
- In 7 th International Workshop on High-Level Parallel Programming Models and Supportive
, 2002
"... This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchmarks, we first illustrate the lack of portable performance on stateof-the-art scala ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchmarks, we first illustrate the lack of portable performance on stateof-the-art scalable parallel systems despite the use of two portable programming models, MPI and OpenMP. Then we present a dynamic compilation and execution framework that provides the desired portability through the use of program slices. These slices are used to select the optimal program decomposition on each architecture. Currently, our framework uses a simple incremental algorithm, which effectively identifies single or multi-level program decompositions that maximize performance. This algorithm can be used as a rule of thumb for automatic multilevel parallelization. The effectiveness of the approach is demonstrated on the NAS benchmarks running on two architectural platforms. 1.
GENERIC PROGRAMMING FOR HIGH-PERFORMANCE SCIENTIFIC COMPUTING
, 2002
"... by Lie-Quan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this para-digm especially suited for application to scientific computing. We apply generic pro-gramming to the development of a ..."
Abstract
- Add to MetaCart
by Lie-Quan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this para-digm especially suited for application to scientific computing. We apply generic pro-gramming to the development of a message passing framework (the Generic Message Passing library) for parallel computing in hybrid execution architectures (i.e., those hav-ing both shared and distributed memory). Although GMP supports both shared-memory and distributed-memory execution, it explicitly separates its programming and execution models, presenting a uniform message-based programming interface to enable source-code portability of parallel programs. At the same time, the implementation of GMP fully exploits the architectural characteristics of its execution target for maximum run-time performance. GMP is specifically designed to seamlessly integrate with modern generic C++ libraries such as the C++ Standard Library. C++ objects with complex data
Understanding performance of SMP clusters running MPI programs
- Future Generation Computer Systems
, 2001
"... CLUsters of MultiProcessors (CLUMPS) have an hybrid memory model, with message passing between nodes and shared memory inside nodes. We examine the performance of Myrinet clusters of SMP PCs when using a Single Memory Model (SMM) based on the MPICH-PM/CLUMP library of the RWCP, which can directly us ..."
Abstract
- Add to MetaCart
CLUsters of MultiProcessors (CLUMPS) have an hybrid memory model, with message passing between nodes and shared memory inside nodes. We examine the performance of Myrinet clusters of SMP PCs when using a Single Memory Model (SMM) based on the MPICH-PM/CLUMP library of the RWCP, which can directly use the MPI programs written for a cluster of uniprocessors. The specicities of the communication patterns with the SMM approach are detailed. PC clusters with 2-way and 4-way nodes are considered and compared.

