Results 1 - 10
of
58
Adaptive Computing on the Grid Using AppLeS
, 2003
"... Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, second ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single execution. However, achieving this performance potential in dynamic, heterogeneous environments is challenging. Recent experience with distributed applications indicates that adaptivity is fundamental to achieving application performance in dynamic Grid environments. The AppLeS (Application Level Scheduling) project provides a methodology, application software, and software environments for adaptively scheduling and deploying applications in dynamic, heterogeneous, multi-user Grid environments. In this paper, we discuss the AppLeS project and outline our results.
Design and Evaluation of a Resource Selection Framework for Grid Applications
, 2002
"... While distributed, heterogeneous collections of computers ("Grids") can in principle be used as a computing platform, in practice the problems of first discovering and then configuring resources to meet application requirements are difficult problems. We present a general-purpose resource selection ..."
Abstract
-
Cited by 80 (7 self)
- Add to MetaCart
While distributed, heterogeneous collections of computers ("Grids") can in principle be used as a computing platform, in practice the problems of first discovering and then configuring resources to meet application requirements are difficult problems. We present a general-purpose resource selection framework that addresses these problems by defining a resource selection service for locating Grid resources that match application requirements. At the heart of this framework is a simple but powerful declarative language based on a technique called set matching, which extends the Condor matchmaking framework to support both single resource and multiple resource selection. This framework also provides an open interface for loading application-specific mapping modules to personalize the resource selector. We present results obtained when this framework is applied in the context of a computational astrophysics application, Cactus. These results demonstrate the effectiveness of our technique.
Customized Dynamic Load Balancing for a Network of Workstations
, 1997
"... this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, application-driven customized dynamic load balancing becomes essential for good performance. We present a hybrid compile-time and run-time modeling and ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, application-driven customized dynamic load balancing becomes essential for good performance. We present a hybrid compile-time and run-time modeling and decision process which selects (customizes) the best scheme, along with automatic generation of parallel code with calls to a run-time library for load balancing. 1997 Academic Press 1.
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks
- In International Parallel Processing Symposium
, 1995
"... PVM and other distributed computing systems have enabled the use of networks of workstations for parallel computation, but their approach of treating a network as a collection of point-to-point connections does not promote efficient communication--- particularly collective communication. ECO is a ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
PVM and other distributed computing systems have enabled the use of networks of workstations for parallel computation, but their approach of treating a network as a collection of point-to-point connections does not promote efficient communication--- particularly collective communication. ECO is a package which solves this problem with programs which analyze the network and establish efficient communication patterns which are used by a library of collective operations. The analysis is done off-line, so that after paying the one-time cost of analyzing the network, the execution of application programs is not delayed. This paper gives performance results from using ECO to implement the collective communication in CHARMM, a widely used macromolecular dynamics package. ECO facilitates the development of data parallel applications by providing a simple interface to routines which use the available heterogeneous networks efficiently. This approach gives a naive programmer the abili...
Application Level Fault Tolerance in Heterogeneous Networks of Workstations
- Journal of Parallel and Distributed Computing
, 1997
"... We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although t ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although transparent to the user, were rejected because they lack support for heterogeneity. We have implemented application level checkpointing which places the checkpoint and restart mechanisms within Dome's C++ objects. Application level checkpointing has been implemented with a library-based technique for the programmer and a more transparent preprocessor-based technique. Dome's implementation of checkpointing successfully checkpoints and restarts processes on different numbers of machines and different architectures. Results from executing Dome programs across a NOW with realistic failure rates have been experimentally determined and are compared with theoretical results. The overhead of checkpoi...
Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments
, 2003
"... In heterogeneous and dynamic environments, efficient execution of parallel computations can require mappings of tasks to processors whose performance is both irregular (because of heterogeneity) and time-varying (because of dynamicity). While adaptive domain decomposition techniques have been used t ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
In heterogeneous and dynamic environments, efficient execution of parallel computations can require mappings of tasks to processors whose performance is both irregular (because of heterogeneity) and time-varying (because of dynamicity). While adaptive domain decomposition techniques have been used to address heterogeneous resource capabilities, temporal variations in those capabilities have seldom been considered. We propose a conservative scheduling policy that uses information about expected future variance in resource capabilities to produce more efficient data mapping decisions. We first present techniques, based on time series predictors that we developed in previous work, for predicting CPU load at some future time point, average CPU load for some future time interval, and variation of CPU load over some future time interval. We then present a family of stochastic scheduling algorithms that exploit such predictions of future availability and variability when making data mapping decisions. Finally, we describe experiments in which we apply our techniques to an astrophysics application. The results of these experiments demonstrate that conservative scheduling can produce execution times that are both significantly faster and less variable than other techniques. 1
MIST: PVM with Transparent Migration and Checkpointing
- In 3rd Annual PVM Users' Group Meeting
, 1995
"... We are currently involved in research to enable PVM to take advantage of shared networks of workstations (NOWs) more effectively. In such a computing environment, it is important to utilize workstations unobtrusively and recover from machine failures. Towards this goal, we have enhanced PVM with tra ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We are currently involved in research to enable PVM to take advantage of shared networks of workstations (NOWs) more effectively. In such a computing environment, it is important to utilize workstations unobtrusively and recover from machine failures. Towards this goal, we have enhanced PVM with transparent task migration, checkpointing, and global scheduling. These enhancements are part of the MIST project which takes an open systems approach in developing a cohesive, distributed parallel computing environment. This open systems approach promotes plug-and-play integration of independently developed modules, such as Condor, DQS, AVS, Prospero, XPVM, PIOUS, Ptools, etc. Transparent task migration, in conjunction with a global scheduler, facilitates the use of shared NOWs by allowing parallel jobs to unobtrusively utilize nodes that are currently unused. PVM tasks can be moved onto nodes that are otherwise idle, and moved off when the node is no longer free. Experiments show that migrati...
Compile-time Scheduling Algorithms for Heterogeneous Network of Workstations
- THE COMPUTER JOURNAL
, 1997
"... In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount of work in each iteration; heterogeneous processors have different speeds; heterogeneous memory refers to the different amount of user-available memory on the machines; and a heterogeneous network has different cost of communication between processors. We propose a simple yet comprehensive model for use in compiling for a network of processors, and develop compiler algorithms for generating optimal and
A Performance Oriented Migration Framework For The Grid
, 2003
"... At least three factors in the existing migrating systems make them less suitable in Grid systems especially when the goal is to improve the response times for individual applications - separate policies for suspension and migration of executing applications employed by these migration systems, the u ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
At least three factors in the existing migrating systems make them less suitable in Grid systems especially when the goal is to improve the response times for individual applications - separate policies for suspension and migration of executing applications employed by these migration systems, the use of pre-defined conditions for suspension and migration and the lack of knowledge of the remaining execution time of the applications. In this paper we describe a migration framework for performance oriented Grid systems that implements tightly coupled policies for both suspension and migration of executing applications. The suspension and migration policies take into account both the load changes on systems as well the remaining execution times of the applications thereby taking into account both system load and application characteristics. The main goal of our migration framework is to improve the response times for individual applications. We also present some results that demonstrate the usefulness of our migrating system.

