Results 1 - 10
of
39
Adaptive MPI
- In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 03
, 2003
"... Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports processor virtualization. ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports processor virtualization.
NAMD: Biomolecular Simulation on Thousands of Processors
- In Proceedings of SC 2002
, 2002
"... NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object-based hybrid force and spatial decomposition scheme and an aggressive measurement-based predictive load balancing framework. We extend this work by demonstrating similar scaling on the much faster processors of the PSC Lemieux Alpha cluster, and for simulations employing efficient (order N log N) particle mesh Ewald full electrostatics.
A Taxonomy of Market-Based Resource Management Systems for Utility-Driven Cluster Computing
, 2004
"... In utility-driven cluster computing, cluster systems need to know the specific needs of different users so as to allocate resources according to their needs. They are also vital in supporting service-oriented Grid computing that harness resources distributed worldwide based on users' objectives. M ..."
Abstract
-
Cited by 33 (10 self)
- Add to MetaCart
In utility-driven cluster computing, cluster systems need to know the specific needs of different users so as to allocate resources according to their needs. They are also vital in supporting service-oriented Grid computing that harness resources distributed worldwide based on users' objectives. Market-based resource management systems make use of real-world market concepts and behavior to assign resources to users. This paper outlines a taxonomy that describes how market-based resource management systems can support utility-driven cluster computing. The taxonomy is used to survey existing market-based resource management systems to better understand how they can be utilized.
Bigsim: A parallel simulator for performance prediction of extremely large parallel machines
- In18th Intl.Paralleland Distr.Proc. Symp. (IPDPS
, 2004
"... We present a parallel simulator — BigSim — for predicting performance of machines with a very large number of processors. The simulator provides the ability to make performance predictions for machines such as Blue-Gene/L, based on actual execution of real applications. We present this capability us ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
We present a parallel simulator — BigSim — for predicting performance of machines with a very large number of processors. The simulator provides the ability to make performance predictions for machines such as Blue-Gene/L, based on actual execution of real applications. We present this capability using case-studies of some application benchmarks. Such a simulator is useful to evaluate the performance of specific applications on such machines even before they are built. A sequential simulator may be too slow or infeasible. However, a parallel simulator faces problems of causality violations. We describe our scheme based on ideas from parallel discrete event simulation and utilize inherent determinacy of many parallel applications. We also explore techniques for optimizing such parallel simulations of machines with large number of processors on existing machines with fewer number of processors. 1 1
A framework for collective personalized communication
- In Proceedings of IPDPS’03
, 2003
"... This paper explores collective personalized communication. For example, in all-to-all personalized communication (AAPC), each processor sends a distinct message to every other processor. However, for many applications, the collective communication pattern is many-to-many, where each processor sends ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
This paper explores collective personalized communication. For example, in all-to-all personalized communication (AAPC), each processor sends a distinct message to every other processor. However, for many applications, the collective communication pattern is many-to-many, where each processor sends a distinct message to a subset of processors. In this paper we first present strategies that reduce per-message cost to optimize AAPC. We then present performance results of these strategies in both all-to-all and many-to-many scenarios. These strategies are implemented in a flexible, asynchronous library with a non-blocking interface, and a message-driven runtime system. This allows the collective communication to run concurrently with the application, if desired. As a result the computational overhead of the communication is substantially reduced, at least on machines such as PSC Lemieux, which sport a coprocessor capable of remote DMA. We demonstrate the advantages of our framework with performance results on several benchmarks and applications, 1
Object-Based Adaptive Load Balancing for MPI Programs
, 2000
"... Parallel Computational Science and Engineering (CSE) applications often exhibit irregular structure and dynamic load patterns. Many such applications have been developed using procedural languages (e.g. Fortran) in message passing parallel programming paradigm (e.g. MPI) for distributed memory ma ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Parallel Computational Science and Engineering (CSE) applications often exhibit irregular structure and dynamic load patterns. Many such applications have been developed using procedural languages (e.g. Fortran) in message passing parallel programming paradigm (e.g. MPI) for distributed memory machines. Incorporating dynamic load balancing techniques at the application-level involves significant changes to the design and structure of applications. On the other hand, traditional run-time systems for MPI do not support dynamic load balancing. Object-based parallel programming languages, such as Charm++ support e#cient dynamic load balancing using object migration for irregular and dynamic applications, as well as to deal with external factors that cause load imbalance. However, converting legacy MPI applications to such object-based paradigms is cumbersome. This paper describes an implementation of MPI, called Adaptive MPI (AMPI) that supports dynamic load balancing and multithreading for MPI applications. Our approach and implementation is based on the user-level migrating threads and load balancing capabilities provided by the Charm++ framework. Conversion from legacy codes to this platform is straightforward even for large legacy codes. We have converted the component codes ROCFLO and ROCSOLID of a Rocket Simulation application to AMPI. Our experience shows that with a minimal overhead and e#ort, one can incorporate dynamic load balancing capabilities in legacy Fortran-MPI codes.
Algorithmic challenges in computational molecular biophysics
- Journal of Computational Physics
, 1999
"... A perspective of biomolecular simulations today is given, with illustrative applications and an emphasis on algorithmic challenges, as reflected by the work of a multidisciplinary team of investigators from five institutions. Included are overviews and recent descriptions of algorithmic work in long ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
A perspective of biomolecular simulations today is given, with illustrative applications and an emphasis on algorithmic challenges, as reflected by the work of a multidisciplinary team of investigators from five institutions. Included are overviews and recent descriptions of algorithmic work in long-time integration for molecular dynamics; fast electrostatic evaluation; crystallographic refinement approaches; and implementation of large, computation-intensive programs on modern architectures. Expected future developments of the field are also discussed. c ○ 1999 Academic Press Key Words: biomolecular simulations; molecular dynamics; long-time integration; fast electrostatics; crystallographic refinement; high-performance platforms.
Supporting Dynamic Parallel Object Arrays
- In Proceedings of ACM 2001 Java Grande/ISCOPE Conference
, 2001
"... We present efficient support for generalized arrays of parallel data driven objects. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most i ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
We present efficient support for generalized arrays of parallel data driven objects. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most importantly, the elements can migrate from processor to processor at any time. The paper discusses support for message delivery and collective operations in face of such dynamic behavior. The migration capabilities of array elements have proven extremely useful, for example, in implementing flexible load balancing strategies and for adaptively exploiting workstation clusters
Adapting to Load on Workstation Clusters
- In The Seventh Symposium on the Frontiers of Massively Parallel Computation
, 1999
"... Desktop workstations represent a largely untapped source of computational power for parallel computing. Two of the main problems in utilizing these workstations are developing strategies for migrating load so that partially loaded workstations can contribute CPU cycles to the computation, and making ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Desktop workstations represent a largely untapped source of computational power for parallel computing. Two of the main problems in utilizing these workstations are developing strategies for migrating load so that partially loaded workstations can contribute CPU cycles to the computation, and making dynamically migratable application programs easy to write. This paper describes object arrays, a construct which makes dynamically migratable applications easier to write, and a simple strategy for migrating load on a workstation cluster.
Run-time Support for Adaptive Load Balancing
"... Many parallel scientific applications have dynamic and irregular computational structure. However, most such applications exhibit persistence of computational load and communication structure. This allows us to embed measurement-based automatic load balancing framework in run-time systems of parall ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Many parallel scientific applications have dynamic and irregular computational structure. However, most such applications exhibit persistence of computational load and communication structure. This allows us to embed measurement-based automatic load balancing framework in run-time systems of parallel languages that are used to build such applications. In this paper, we describe such a framework built for the Converse [4] interoperable runtime system. This framework is composed of mechanisms for recording application performance data, a mechanism for object migration, and interfaces for plug-in load balancing strategy objects. Interfaces for strategy objects allow easy implementation of novel load balancing strategies that could use application characteristics on the entire machine, or only a local neighborhood. We present the performance of a few strategies on a synthetic benchmark and also the impact of automatic load balancing on an actual application.

