Results 1 - 10
of
25
Case study for running HPC applications in public clouds,”
- in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ser. HPDC ’10.
, 2010
"... ABSTRACT Cloud computing is emerging as an alternative computing platform to bridge the gap between scientists' growing computational demands and their computing capabilities. A scientist who wants to run HPC applications can obtain massive computing resources 'in the cloud' quickly ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Cloud computing is emerging as an alternative computing platform to bridge the gap between scientists' growing computational demands and their computing capabilities. A scientist who wants to run HPC applications can obtain massive computing resources 'in the cloud' quickly (in minutes), as opposed to days or weeks it normally takes under traditional business processes. Due to the popularity of Amazon EC2, most HPC-in-the-cloud research has been conducted using EC2 as a target platform. Previous work has not investigated how results might depend upon the cloud platform used. In this paper, we extend previous research to three public cloud computing platforms. In addition to running classical benchmarks, we also port a 'full-size' NASA climate prediction application into the cloud, and compare our results with that from dedicated HPC systems. Our results show that 1) virtualization technology, which is widely used by cloud computing, adds little performance overhead; 2) most current public clouds are not designed for running scientific applications primarily due to their poor networking capabilities. However, a cloud with moderately better network (vs. EC2) will deliver a significant performance improvement. Our observations will help to quantify the improvement of using fast networks for running HPC-in-thecloud, and indicate a promising trend of HPC capability in future private science clouds. We also discuss techniques that will help scientists to best utilize public cloud platforms despite current deficiencies.
An Evaluation of User-Level Failure Mitigation Support in MPI
"... As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming mod ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming model, and without such support, they are bound to fail. This paper discusses the failure-free overhead and recovery impact aspects of the User-Level Failure Mitigation proposal presented in the MPI Forum. Experiments demonstrate that fault-aware MPI has little or no impact on performance for a range of applications, and produces satisfactory recovery times when there are failures.
High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems
"... The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—p ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems – a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.
Comparing fork/join and MapReduce
- Department of Computer Science, Heriot-Watt University
, 2012
"... This paper provides an empirical comparison of fork/join and MapReduce, which are two popular parallel execution models. We use the Java fork/join framework for fork/join, and the Hadoop platform for MapReduce. We want to evaluate these two parallel platforms in terms of scalability and pro-grammabi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper provides an empirical comparison of fork/join and MapReduce, which are two popular parallel execution models. We use the Java fork/join framework for fork/join, and the Hadoop platform for MapReduce. We want to evaluate these two parallel platforms in terms of scalability and pro-grammability. Our set task is the creation and execution of a simple concordance benchmark application, taken from phase I of the SICSA multicore challenge. We find that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the memory limitations of a single multicore server. On the other hand, Hadoop has high startup latency (tens of seconds), but it scales to large input data sizes (>100MB) on a compute cluster. Thus we learn that each platform has its advantages, for different kinds of inputs and underlying hardware execution layers. This leads us to consider the possibility of a hybrid approach that features the best of both parallel platforms. We implement a prototype grep application using a hybrid multi-threaded Hadoop combination, and discuss its potential. 1
Portable Explicit Threading and Concurrent Programming for MPI Applications
"... New applications for parallel computing in today’s data centers, such as online analytical processing, data mining or information retrieval, require support for concurrency. Due to online query processing and multi-user operation, we need to concurrently maintain and analyze the data. While the Port ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
New applications for parallel computing in today’s data centers, such as online analytical processing, data mining or information retrieval, require support for concurrency. Due to online query processing and multi-user operation, we need to concurrently maintain and analyze the data. While the Portable Operating System Interface (POSIX) defines a thread interface that is widely available, and while modern implementations of the Message Passing Interface (MPI) support threading, this combination is lacking in safety, security and reliability. The development of such parallel applications is therefore complex, difficult and error-prone. In response to this, we propose an additional layer of middleware for threaded MPI applications designed to simplify the development of concurrent parallel programs. We formulate a list of requirements and sketch a design rationale for such a library. Based on a prototype implementation, we evaluate the run-time overhead to estimate the overhead caused by the additional layer of indirection.
MT-MPI: Multithreaded MPI for many-core environments
- in Proceedings of the 28th ACM international conference on Supercomputing (ICS
"... Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To uti-lize such architectures, application programmers are increas-ingly looking at hybrid programming models, where multi-ple threads interact with the MPI library (frequently called “MPI ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To uti-lize such architectures, application programmers are increas-ingly looking at hybrid programming models, where multi-ple threads interact with the MPI library (frequently called “MPI+X ” models). A common mode of operation for such applications uses multiple threads to parallelize the compu-tation, while one of the threads also issues MPI operations (i.e., MPI FUNNELED or SERIALIZED thread-safety mode). In MPI+OpenMP applications, this is achieved, for example, by placing MPI calls in OpenMP critical sections or outside the OpenMP parallel regions. However, such a model often means that the OpenMP threads are active only during the parallel computation phase and idle during the MPI calls, resulting in wasted computational resources. In this paper, we present MT-MPI, an internally multithreaded MPI imple-mentation that transparently coordinates with the threading runtime system to share idle threads with the application. It is designed in the context of OpenMP and requires modifi-cations to both the MPI implementation and the OpenMP runtime in order to share appropriate information between them. We demonstrate the benefit of such internal paral-lelism for various aspects of MPI processing, including de-rived datatype communication, shared-memory communica-tion, and network I/O operations.
A Spatiotemporal Data Aggregation Technique for Performance Analysis of Large-scale Execution Traces
, 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
SEEKING PRODUCTIVITY AND PERFORMANCE
"... In this note we propose two projects: (1) creating a hierar-chical programming model from current models; and (2) extracting application primitives from the “13 dwarfs”. The first topic addresses the need for a unified and managea-ble framework for very large-scale concurrent execution. This is the ..."
Abstract
- Add to MetaCart
In this note we propose two projects: (1) creating a hierar-chical programming model from current models; and (2) extracting application primitives from the “13 dwarfs”. The first topic addresses the need for a unified and managea-ble framework for very large-scale concurrent execution. This is the productivity part: less complexity will drive better mapping of algorithms to architecture, which will also con-tribute to better performance. The second topic focuses mostly on the processor and the node with the aim of laying the groundwork for software and silicon optimized kernels. While it is understood that applications primitives are out-side the scope of IESP, the motivation for introducing it here is that it is a companion issue and that increasing the efficiency of each processor provides high return for sci-ence, at all levels of system size.
The Application Perspective- Seeking Productivity and Performance-
"... Abstract—In this note we propose two projects: (1) Creating a hierarchical programming model from current models, and (2) Extracting application primitives from the ”13 dwarfs”. The first topic addresses the need for a unified and manageable framework for very large scale concurrent execution. This ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In this note we propose two projects: (1) Creating a hierarchical programming model from current models, and (2) Extracting application primitives from the ”13 dwarfs”. The first topic addresses the need for a unified and manageable framework for very large scale concurrent execution. This is the productivity part- less complexity will drive better mapping of algorithms to architecture; which will also contributes to better performance. The second topic focuses mostly on the processor and the node with the aim of laying the groundwork for software and silicon optimized kernels. While it is understood that applications primitives are outside the scope of IESP, the motivation for introducing it here is that it is a companion issue and that increasing the efficiency of each processor provides high return for science- at all levels of system size.
MPI at Exascale
"... With petascale systems already available, researchers are devoting their attention to the issues needed to reach the next major level in performance, namely, exascale. Explicit message passing using the Message Passing Interface (MPI) is the most commonly used model for programming petascale systems ..."
Abstract
- Add to MetaCart
(Show Context)
With petascale systems already available, researchers are devoting their attention to the issues needed to reach the next major level in performance, namely, exascale. Explicit message passing using the Message Passing Interface (MPI) is the most commonly used model for programming petascale systems today. In this paper, we investigate what is needed to enable MPI to scale to exascale, both in the MPI specification and in MPI implementations, focusing on issues such as memory consumption and performance. We also present results of experiments related to MPI memory consumption at scale on the IBM Blue Gene/P at Argonne National Laboratory. 1