• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Early experiments with the openmp/mpi hybrid programming model. (2008)

by E L Lusk, A Chan
Venue:In IWOMP,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 25
Next 10 →

Case study for running HPC applications in public clouds,”

by Qiming He , Shujia Zhou , Ben Kobler , Dan Duffy , Tom Mcglynn - in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ser. HPDC ’10. , 2010
"... ABSTRACT Cloud computing is emerging as an alternative computing platform to bridge the gap between scientists' growing computational demands and their computing capabilities. A scientist who wants to run HPC applications can obtain massive computing resources 'in the cloud' quickly ..."
Abstract - Cited by 36 (0 self) - Add to MetaCart
ABSTRACT Cloud computing is emerging as an alternative computing platform to bridge the gap between scientists' growing computational demands and their computing capabilities. A scientist who wants to run HPC applications can obtain massive computing resources 'in the cloud' quickly (in minutes), as opposed to days or weeks it normally takes under traditional business processes. Due to the popularity of Amazon EC2, most HPC-in-the-cloud research has been conducted using EC2 as a target platform. Previous work has not investigated how results might depend upon the cloud platform used. In this paper, we extend previous research to three public cloud computing platforms. In addition to running classical benchmarks, we also port a 'full-size' NASA climate prediction application into the cloud, and compare our results with that from dedicated HPC systems. Our results show that 1) virtualization technology, which is widely used by cloud computing, adds little performance overhead; 2) most current public clouds are not designed for running scientific applications primarily due to their poor networking capabilities. However, a cloud with moderately better network (vs. EC2) will deliver a significant performance improvement. Our observations will help to quantify the improvement of using fast networks for running HPC-in-thecloud, and indicate a promising trend of HPC capability in future private science clouds. We also discuss techniques that will help scientists to best utilize public cloud platforms despite current deficiencies.
(Show Context)

Citation Context

...while EC2 underperforms NASA systems by 50+%. 4.4 Discussions In this section, we discuss other technical and non-technical factors that may affect HPC applications in the cloud. 4.4.1 Parallel programming paradigm In previous sections, our case studies include pure MPI (via distributed-memory communication) and pure OpenMP (via shared-memory communication) implementations. In order to overcome network problems posed by current public clouds, one may consider combining these two parallel programming paradigms. We use NPB-MZ benchmark to demonstrate the effectiveness of using hybrid MPI+OpenMP [19] to improve HPC application scalability in the cloud. Firstly, we use the IBM-cloud (the one with the slowest network) and NPB-MPI benchmark to show that a pure MPI implementation does not scale up from a single server. We chose BT, LU and SP benchmarks in order to compare with results from hybrid approach. Due to the special requirements of NPB-MPI benchmarks, we choose 4, 9, 16 and 25 MPI processes for BT and SP, and 8, 16 and 32 MPI processes for LU respectively. Figure 8 shows performance starts degrading when the cluster size is larger than a single node which can hold up to 8 MPI process...

An Evaluation of User-Level Failure Mitigation Support in MPI

by Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, Jack J. Dongarra
"... As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming mod ..."
Abstract - Cited by 16 (3 self) - Add to MetaCart
As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming model, and without such support, they are bound to fail. This paper discusses the failure-free overhead and recovery impact aspects of the User-Level Failure Mitigation proposal presented in the MPI Forum. Experiments demonstrate that fault-aware MPI has little or no impact on performance for a range of applications, and produces satisfactory recovery times when there are failures.

High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems

by Haoqiang Jin, Dennis Jespersen, Piyush Mehrotra, Rupak Biswas, Lei Huang, Barbara Chapman
"... The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—p ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems – a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.

Comparing fork/join and MapReduce

by Robert Stewart, Jeremy Singer - Department of Computer Science, Heriot-Watt University , 2012
"... This paper provides an empirical comparison of fork/join and MapReduce, which are two popular parallel execution models. We use the Java fork/join framework for fork/join, and the Hadoop platform for MapReduce. We want to evaluate these two parallel platforms in terms of scalability and pro-grammabi ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper provides an empirical comparison of fork/join and MapReduce, which are two popular parallel execution models. We use the Java fork/join framework for fork/join, and the Hadoop platform for MapReduce. We want to evaluate these two parallel platforms in terms of scalability and pro-grammability. Our set task is the creation and execution of a simple concordance benchmark application, taken from phase I of the SICSA multicore challenge. We find that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the memory limitations of a single multicore server. On the other hand, Hadoop has high startup latency (tens of seconds), but it scales to large input data sizes (>100MB) on a compute cluster. Thus we learn that each platform has its advantages, for different kinds of inputs and underlying hardware execution layers. This leads us to consider the possibility of a hybrid approach that features the best of both parallel platforms. We implement a prototype grep application using a hybrid multi-threaded Hadoop combination, and discuss its potential. 1
(Show Context)

Citation Context

...essing models. Attempts to combine eager work distribution and multicore parallelism are emerging for a number of language implementations. The hybrid combination of MPI and OpenMP is investigated in =-=[4]-=- for parallel C programming, and a two layered parallelism model for Haskell is investigated in the implementation of CloudHaskell [5]. Our first implementation of concordance uses Java fork/join (Sec...

Portable Explicit Threading and Concurrent Programming for MPI Applications

by Tobias Berka, Helge Hagenauer
"... New applications for parallel computing in today’s data centers, such as online analytical processing, data mining or information retrieval, require support for concurrency. Due to online query processing and multi-user operation, we need to concurrently maintain and analyze the data. While the Port ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
New applications for parallel computing in today’s data centers, such as online analytical processing, data mining or information retrieval, require support for concurrency. Due to online query processing and multi-user operation, we need to concurrently maintain and analyze the data. While the Portable Operating System Interface (POSIX) defines a thread interface that is widely available, and while modern implementations of the Message Passing Interface (MPI) support threading, this combination is lacking in safety, security and reliability. The development of such parallel applications is therefore complex, difficult and error-prone. In response to this, we propose an additional layer of middleware for threaded MPI applications designed to simplify the development of concurrent parallel programs. We formulate a list of requirements and sketch a design rationale for such a library. Based on a prototype implementation, we evaluate the run-time overhead to estimate the overhead caused by the additional layer of indirection.
(Show Context)

Citation Context

...earch issue [8] [9]. Based on this thread support, MPI has been combined with existing multi-threaded parallel programming models. The combination of OpenMP and MPI is still an ongoing research issue =-=[10]-=- but not a new idea [11]. But these are hybrid programming models and do not provide us with the concurrency support we require. The MPI Forum currently leads an open discussion of new features for MP...

MT-MPI: Multithreaded MPI for many-core environments

by Min Si, Antonio J. Peña, Pavan Balaji, Masamichi Takagi, Yutaka Ishikawa - in Proceedings of the 28th ACM international conference on Supercomputing (ICS
"... Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To uti-lize such architectures, application programmers are increas-ingly looking at hybrid programming models, where multi-ple threads interact with the MPI library (frequently called “MPI ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To uti-lize such architectures, application programmers are increas-ingly looking at hybrid programming models, where multi-ple threads interact with the MPI library (frequently called “MPI+X ” models). A common mode of operation for such applications uses multiple threads to parallelize the compu-tation, while one of the threads also issues MPI operations (i.e., MPI FUNNELED or SERIALIZED thread-safety mode). In MPI+OpenMP applications, this is achieved, for example, by placing MPI calls in OpenMP critical sections or outside the OpenMP parallel regions. However, such a model often means that the OpenMP threads are active only during the parallel computation phase and idle during the MPI calls, resulting in wasted computational resources. In this paper, we present MT-MPI, an internally multithreaded MPI imple-mentation that transparently coordinates with the threading runtime system to share idle threads with the application. It is designed in the context of OpenMP and requires modifi-cations to both the MPI implementation and the OpenMP runtime in order to share appropriate information between them. We demonstrate the benefit of such internal paral-lelism for various aspects of MPI processing, including de-rived datatype communication, shared-memory communica-tion, and network I/O operations.
(Show Context)

Citation Context

...cesses, thus causing some unevenness in MTMPI’s parallelization. 5. RELATED WORK The hybrid MPI+OpenMP programming model has been extensively used and studied in the past. For instance, Lusk and Chan =-=[10]-=- explored the performance of such a model on a typical Linux cluster, a large-scale system from SiCortex, and an IBM Blue Gene/P system. The authors concluded that some applications performed better w...

A Spatiotemporal Data Aggregation Technique for Performance Analysis of Large-scale Execution Traces

by Damien Dosimont, Robin Lamarche-perrin, Lucas Mello Schnorr, Huard Jean-marc Vincent, Damien Dosimont, Robin Lamarche-perrin, Lucas Mello Schnorr, Guillaume Huard, Damien Dosimont, Robin Lamarche-perrin, Lucas Mello Schnorr, Guillaume Huard, Jean-marc Vincent , 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

...Pajé [7], LTTng Eclipse Viewer [8] ⋆ • • • • • Gantt Chart Time compression (⋆), Hierarchical aggregation (◦) KPTrace Viewer [9] ◦ • • • Gantt Chart Time abstraction (⋆), No aggregation (◦) Jumpshot =-=[10]-=- ⋆ • • • • • • Timeline Pixel-guided (⋆, ◦) Vampir [4] • ⋆ • • Timeline Information aggregation (⋆, ◦) Ocelotl [11], [12] • • • • • • • Task Profile Clustering (◦), Mean Operation (⋆) Vampir [4] • • •...

SEEKING PRODUCTIVITY AND PERFORMANCE

by David Barkai
"... In this note we propose two projects: (1) creating a hierar-chical programming model from current models; and (2) extracting application primitives from the “13 dwarfs”. The first topic addresses the need for a unified and managea-ble framework for very large-scale concurrent execution. This is the ..."
Abstract - Add to MetaCart
In this note we propose two projects: (1) creating a hierar-chical programming model from current models; and (2) extracting application primitives from the “13 dwarfs”. The first topic addresses the need for a unified and managea-ble framework for very large-scale concurrent execution. This is the productivity part: less complexity will drive better mapping of algorithms to architecture, which will also con-tribute to better performance. The second topic focuses mostly on the processor and the node with the aim of laying the groundwork for software and silicon optimized kernels. While it is understood that applications primitives are out-side the scope of IESP, the motivation for introducing it here is that it is a companion issue and that increasing the efficiency of each processor provides high return for sci-ence, at all levels of system size.

The Application Perspective- Seeking Productivity and Performance-

by David Barkai
"... Abstract—In this note we propose two projects: (1) Creating a hierarchical programming model from current models, and (2) Extracting application primitives from the ”13 dwarfs”. The first topic addresses the need for a unified and manageable framework for very large scale concurrent execution. This ..."
Abstract - Add to MetaCart
Abstract—In this note we propose two projects: (1) Creating a hierarchical programming model from current models, and (2) Extracting application primitives from the ”13 dwarfs”. The first topic addresses the need for a unified and manageable framework for very large scale concurrent execution. This is the productivity part- less complexity will drive better mapping of algorithms to architecture; which will also contributes to better performance. The second topic focuses mostly on the processor and the node with the aim of laying the groundwork for software and silicon optimized kernels. While it is understood that applications primitives are outside the scope of IESP, the motivation for introducing it here is that it is a companion issue and that increasing the efficiency of each processor provides high return for science- at all levels of system size.
(Show Context)

Citation Context

...bility of concurrency, led to various experimentations in ”hybrid” implementations - combining MPI with OpenMP or other shared memory schemes. These met with varying degrees of success (see [2], [3], =-=[4]-=-, [5]). It is stipulated that the use of OpenMP would not have been required if we had ’layered-MPI’ to define such a hierarchy to help manage the decomposition of the application. A layered model is ...

MPI at Exascale

by Rajeev Thakur, Pavan Balaji, Darius Buntinas, David Goodell, Torsten Hoefler, Sameer Kumar, Ewing Lusk, Jesper Larsson Träff
"... With petascale systems already available, researchers are devoting their attention to the issues needed to reach the next major level in performance, namely, exascale. Explicit message passing using the Message Passing Interface (MPI) is the most commonly used model for programming petascale systems ..."
Abstract - Add to MetaCart
With petascale systems already available, researchers are devoting their attention to the issues needed to reach the next major level in performance, namely, exascale. Explicit message passing using the Message Passing Interface (MPI) is the most commonly used model for programming petascale systems today. In this paper, we investigate what is needed to enable MPI to scale to exascale, both in the MPI specification and in MPI implementations, focusing on issues such as memory consumption and performance. We also present results of experiments related to MPI memory consumption at scale on the IBM Blue Gene/P at Argonne National Laboratory. 1
(Show Context)

Citation Context

...e shared-memory model (X) for accessing data within an address space that spans multiple cores or even multiple nodes. Options for X include the following. • OpenMP. This option has been well studied =-=[15, 17, 25, 27]-=-. One limitation is that the shared memory is usually restricted to the address space of a single physical node. • PGAS languages such as UPC [8] or CoArray Fortran [24]. This option is not as well un...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University