Results 1 - 10
of
223
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
- in Proceedings, LACSI Symposium, Sante Fe
, 2003
"... As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for providing coordinated checkpointing and rollback ..."
Abstract
-
Cited by 109 (10 self)
- Add to MetaCart
(Show Context)
As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for providing coordinated checkpointing and rollback recovery for MPI-based parallel applications. Our approach integrates the Berkeley Lab BLCR kernellevel process checkpoint system with the LAM implementation of MPI through a defined checkpoint/restart interface. Checkpointing is transparent to the application, allowing the system to be used for cluster maintenance and scheduling reasons as well as for fault tolerance. Experimental results show negligible communication performance impact due to the incorporation of the checkpoint support capabilities into LAM/MPI. 1
Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2
- In The 1st Workshop on Cloud Computing and its Applications (CCA
, 2008
"... In this article we describe the application of HPC standard benchmark tests to Amazon’s EC2 cloud computing system, in order to explore the utility of EC2 for modest HPC style applications. Based on these benchmarks we find that the EC2 cloud system is emerging as a credible solution for supporting ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
In this article we describe the application of HPC standard benchmark tests to Amazon’s EC2 cloud computing system, in order to explore the utility of EC2 for modest HPC style applications. Based on these benchmarks we find that the EC2 cloud system is emerging as a credible solution for supporting responsive on-demand, small sized, HPC applications. We illustrate this point with a demonstration of a low-order coupled atmosphere-ocean simulation running in parallel on an EC2 system. This demonstration highlights one significant way in which cloud computing could impact traditional HPC computing paradigms that, together with related ease-of-use technologies (described only briefly in this paper), could shift the manner in which many HPC systems are deployed.
Performance Analysis of MPI Collective Operations
- In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) - Workshop 15
, 2005
"... Previous studies of application usage show that the performance of collective communica-tions are critical for high performance computing and are often overlooked when compared to the point-to-point performance. In this paper we attempt to analyze and improve collective communication in the context ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
(Show Context)
Previous studies of application usage show that the performance of collective communica-tions are critical for high performance computing and are often overlooked when compared to the point-to-point performance. In this paper we attempt to analyze and improve collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP. The predictions from the models were compared to the experimentally gathered data and our findings were used to optimize the implementation of collective operations in the FT-MPI library. 1
Performance Evaluation of Adaptive MPI
, 2006
"... Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. CHARM++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), ..."
Abstract
-
Cited by 51 (19 self)
- Add to MetaCart
Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. CHARM++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), an MPI implementation and extension, that supports processor virtualization. AMPI implements virtual MPI processes (VPs), several of which may be mapped to a single physical processor. AMPI includes a powerful runtime support system that takes advantage of the degree of freedom afforded by allowing it to assign VPs onto processors. With this runtime system, AMPI supports such features as automatic adaptive overlapping of communication and computation, automatic load balancing, flexibility of running on arbitrary number of processors, and checkpoint/restart support. It also inherits communication optimization from CHARM++ framework. This paper describes AMPI, illustrates its performance benefits through a series of benchmarks, and shows that AMPI is a portable and mature MPI implementation that offers various performance benefits to dynamic applications.
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem
- Proceedings of the International Symposium on Cluster Computing and the Grid
, 2006
"... This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively m ..."
Abstract
-
Cited by 51 (10 self)
- Add to MetaCart
(Show Context)
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks. 1
High Performance Parallel Computing with Cloud and Cloud Technologies
"... We present our experiences in applying, developing, and evaluating cloud and cloud technologies. First, we present our experience in applying Hadoop and DryadLINQ to a series of data/compute intensive applications and then compare them with a novel MapReduce runtime developed by us, named CGL-MapRed ..."
Abstract
-
Cited by 50 (14 self)
- Add to MetaCart
We present our experiences in applying, developing, and evaluating cloud and cloud technologies. First, we present our experience in applying Hadoop and DryadLINQ to a series of data/compute intensive applications and then compare them with a novel MapReduce runtime developed by us, named CGL-MapReduce, and MPI. Preliminary applications are developed for particle physics, bioinformatics, clustering, and matrix multiplication. We identify the basic execution units of the MapReduce programming model and categorize the runtimes according to their characteristics. MPI versions of the applications are used where the contrast in performance needs to be highlighted. We discuss the application structure and their mapping to parallel architectures of different types, and look at the performance of these applications. Next, we present a performance analysis of MPI parallel applications on virtualized resources.
The design and implementation of checkpoint/restart process fault tolerance for Open MPI
- In Workshop on Dependable Parallel, Distributed and Network-Centric Systems(DPDNS), in conjunction with IPDPS
, 2007
"... To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementations that incorporated fault tolerance capabilities have been limited by lack of modularity, scalability and usability. Th ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
(Show Context)
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementations that incorporated fault tolerance capabilities have been limited by lack of modularity, scalability and usability. This paper presents the design and implementation of an infrastructure to support checkpoint/restart fault tolerance in the Open MPI project. We identify the general capabilities required for distributed checkpoint/restart and realize these capabilities as extensible frameworks within Open MPI’s modular component architecture. Our design features an abstract interface for providing and accessing fault tolerance services without sacrificing performance, robustness, or flexibility. Although our implementation includes support for some initial checkpoint/restart mechanisms, the framework is meant to be extensible and to encourage experimentation of alternative techniques within a production quality MPI implementation. 1.
CHALLENGES IN PARALLEL GRAPH PROCESSING
- PARALLEL PROCESSING LETTERS
, 2006
"... Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, softwa ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems. In this paper we present the inter-relationships between graph problems, software, and parallel hardware in the current state of the art and discuss how those issues present inherent challenges in solving large-scale graph problems. The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.
Virtual Machine Aware Communication Libraries for High Performance Computing
"... As the size and complexity of modern computing systems keep increasing to meet the demanding requirements of High Performance Computing (HPC) applications, manageability is becoming a critical concern to achieve both high performance and high productivity computing. Meanwhile, virtual machine (VM) t ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
(Show Context)
As the size and complexity of modern computing systems keep increasing to meet the demanding requirements of High Performance Computing (HPC) applications, manageability is becoming a critical concern to achieve both high performance and high productivity computing. Meanwhile, virtual machine (VM) technologies have become popular in both industry and academia due to various features designed to ease system management and administration. While a VM-based environment can greatly help manageability on large-scale computing systems, concerns over performance have largely blocked the HPC community from embracing VM technologies. In this paper, we follow three steps to demonstrate the ability to achieve near-native performance in a VM-based environment for HPC. First, we propose Inter-VM Communication
Implementation and sharedmemory evaluation of MPICH2 over the Nemesis communication subsystem
- in Recent Adv. in PVM and MPI, ser. Lec. Notes in Comp. Sci
"... Abstract. This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its sharedmemory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its sharedmemory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks as well as application benchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications. 1