Results 1 - 10
of
34
Bigsim: A parallel simulator for performance prediction of extremely large parallel machines
- In18th Intl.Paralleland Distr.Proc. Symp. (IPDPS
, 2004
"... We present a parallel simulator — BigSim — for predicting performance of machines with a very large number of processors. The simulator provides the ability to make performance predictions for machines such as Blue-Gene/L, based on actual execution of real applications. We present this capability us ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
We present a parallel simulator — BigSim — for predicting performance of machines with a very large number of processors. The simulator provides the ability to make performance predictions for machines such as Blue-Gene/L, based on actual execution of real applications. We present this capability using case-studies of some application benchmarks. Such a simulator is useful to evaluate the performance of specific applications on such machines even before they are built. A sequential simulator may be too slow or infeasible. However, a parallel simulator faces problems of causality violations. We describe our scheme based on ideas from parallel discrete event simulation and utilize inherent determinacy of many parallel applications. We also explore techniques for optimizing such parallel simulations of machines with large number of processors on existing machines with fewer number of processors. 1 1
Performance Evaluation of Adaptive MPI
, 2006
"... Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. CHARM++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. CHARM++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), an MPI implementation and extension, that supports processor virtualization. AMPI implements virtual MPI processes (VPs), several of which may be mapped to a single physical processor. AMPI includes a powerful runtime support system that takes advantage of the degree of freedom afforded by allowing it to assign VPs onto processors. With this runtime system, AMPI supports such features as automatic adaptive overlapping of communication and computation, automatic load balancing, flexibility of running on arbitrary number of processors, and checkpoint/restart support. It also inherits communication optimization from CHARM++ framework. This paper describes AMPI, illustrates its performance benefits through a series of benchmarks, and shows that AMPI is a portable and mature MPI implementation that offers various performance benefits to dynamic applications.
Proactive fault tolerance in mpi applications via task migration
- In International Conference on High Performance Computing
, 2006
"... Abstract. Failures are likely to be more frequent in systems with thousands of processors. Therefore, schemes for dealing with faults become increasingly important. In this paper, we present a fault tolerance solution for parallel applications that proactively migrates execution from processors wher ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. Failures are likely to be more frequent in systems with thousands of processors. Therefore, schemes for dealing with faults become increasingly important. In this paper, we present a fault tolerance solution for parallel applications that proactively migrates execution from processors where failure is imminent. Our approach assumes that some failures are predictable, and leverages the features in current hardware devices supporting early indication of faults. We use the concepts of processor virtualization and dynamic task migration, provided by Charm++ and Adaptive MPI (AMPI), to implement a mechanism that migrates tasks away from processors which are expected to fail. To demonstrate the feasibility of our approach, we present performance data from experiments with existing MPI applications. Our results show that proactive task migration is an effective technique to tolerate faults in MPI applications. 1
Scaling applications to massively parallel machines using projections performance analysis tool
- In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis
, 2005
"... Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance challenges must be identified and solved to attain high parallel efficiency in such cases. We present case studies invo ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance challenges must be identified and solved to attain high parallel efficiency in such cases. We present case studies involving NAMD, a parallel classic molecular dynamics application for large biomolecular systems, and CPAIMD, Car-Parrinello ab initio molecular dynamics application, and efforts to scale them to large number of processors. Both applications are implemented in Charm++, and the performance analysis was carried out using Projections, the performance visualization/analysis tool associated with Charm++. We will showcase a series of optimizations facilitated by Projections. The resultant performance of NAMD led to a Gordon Bell award at SC2002 with unprecedented speedup on 3,000 processors with teraflops level peak performance. We also explore the techniques for applying the performance visualization/analysis tool on future generation extreme-scale parallel machines and discuss the scalability issues with Projections. 1
An architecture for reconfigurable iterative MPI applications in dynamic environments
- Proc. of the Sixth International Conference on Parallel Processing and Applied Mathematics (PPAM’2005), number 3911 in LNCS
, 2005
"... Abstract. With the proliferation of large scale dynamic execution environments such as grids, the need for providing efficient and scalable application adaptation strategies for long running parallel and distributed applications has emerged. Message passing interfaces have been initially designed wi ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract. With the proliferation of large scale dynamic execution environments such as grids, the need for providing efficient and scalable application adaptation strategies for long running parallel and distributed applications has emerged. Message passing interfaces have been initially designed with a traditional machine model in mind which assumes homogeneous and static environments. It is inevitable that long running message passing applications will require support for dynamic reconfiguration to maintain high performance under varying load conditions. In this paper we describe a framework that provides iterative MPI applications with reconfiguration capabilities. Our approach is based on integrating MPI applications with a middleware that supports process migration and large scale distributed application reconfiguration. We present our architecture for reconfiguring MPI applications, and verify our design with a heat diffusion application in a dynamic setting. 1
Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
, 2006
"... Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a highbandwidth, low-latency interconnect. There are many benefits t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a highbandwidth, low-latency interconnect. There are many benefits to using clusters for computation, but also some drawbacks, including a tendency to exhibit low Mean Time To Failure (MTTF) due to the sheer number of components involved. Recently, a number of fault-tolerance techniques have been proposed and developed to mitigate the inherent unreliability of clusters. These techniques, however, fail to address the issue of detecting non-obvious faults, particularly Byzantine faults. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework.
MSA: Multiphase specifically shared arrays
- In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing
, 2004
"... Abstract. Shared address space (SAS) parallel programming models have faced difficulty scaling to large number of processors. Further, although in some cases SAS programs are easier to develop, in other cases they face difficulties due to a large number of race conditions. We contend that a multi-pa ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. Shared address space (SAS) parallel programming models have faced difficulty scaling to large number of processors. Further, although in some cases SAS programs are easier to develop, in other cases they face difficulties due to a large number of race conditions. We contend that a multi-paradigm programming model comprising a distributedmemory model with a disciplined form of shared-memory programming may constitute a “complete ” and powerful parallel programming system. Optimized coherence mechanisms based on the specific access pattern of a shared variable show significant performance benefits over general DSM coherence protocols. We present MSA, a system that supports such specifically shared arrays that can be shared in read-only, write-many, and accumulate modes. These simple modes scale well and are general enough to capture the majority of shared memory access patterns. MSA does not support a general read-write access mode, but a single array can be shared in read-only mode in one phase and write-many in another. MSA coexists with the message-passing paradigm (MPI) and the processor virtualization-based message-driven paradigm(Charm++). We present the model, its implementation, programming examples and preliminary performance results. 1 1
Performance modeling and programming environments for petaflops computers and the blue gene machine
- in Computer Science in
, 2004
"... We present a performance modeling and programming environment for petaflops computers and the Blue Gene machine. It consists of a parallel simulator, BigSim, for predicting performance of machines with a very large number of processors, and BigNetSim, an ongoing effort to incorporate a pluggable mod ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
We present a performance modeling and programming environment for petaflops computers and the Blue Gene machine. It consists of a parallel simulator, BigSim, for predicting performance of machines with a very large number of processors, and BigNetSim, an ongoing effort to incorporate a pluggable module of a detailed contentionbased network model. It provides the ability to make performance predictions for machines such as BlueGene/L. We also explore the programming environments for several planned applications on the machines including Finite Element Method (FEM) simulation. 1
Maestro-VC: On-Demand Secure Cluster Computing Using
- In 7th LCI International Conference on Linux Clusters
, 2006
"... On-demand computing is the name given to technology which enables an infrastructure where computing cycles are treated as a commodity, and where such a commodity can be accessed upon request. In this way the goals of on-demand computing overlap with and are similar to those of Grid computing: both ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
On-demand computing is the name given to technology which enables an infrastructure where computing cycles are treated as a commodity, and where such a commodity can be accessed upon request. In this way the goals of on-demand computing overlap with and are similar to those of Grid computing: both enable the pooling of global computing resources to solve complex computational problems.
A Survey of Virtualization Techniques Focusing on Secure On-Demand Cluster Computing
, 2005
"... Virtualization, a technique once used to multiplex the resources of high-priced mainframe hardware, is seeing a resurgence in applicability with the increasing computing power of commodity computers. By inserting a layer of software between the machine and traditional operating systems, this technol ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Virtualization, a technique once used to multiplex the resources of high-priced mainframe hardware, is seeing a resurgence in applicability with the increasing computing power of commodity computers. By inserting a layer of software between the machine and traditional operating systems, this technology allows access to a shared computing medium in a manner that is secure, resource-controlled, and e#cient. These properties are attractive in the field of on-demand computing, where the fine-grained subdivision of resources provided by virtualized systems allows potentially higher utilization of computing resources.

