Results 1 - 10
of
55
The AppLeS Project: A Status Report
, 1997
"... Fast networks have made it possible to aggregate distributed CPU, memory, storage, and data to provide the potential for application performance superior to that attainable on any single system. However, achieving such performance on these metacomputing systems has proved to be difficult. Experience ..."
Abstract
-
Cited by 114 (9 self)
- Add to MetaCart
Fast networks have made it possible to aggregate distributed CPU, memory, storage, and data to provide the potential for application performance superior to that attainable on any single system. However, achieving such performance on these metacomputing systems has proved to be difficult. Experience with the I-WAY [DFP + ss] and other metacomputing platforms demonstrates that effective application scheduling is critical to the achievement of performance for metacomputing applications. Currently, application developers develop customized application schedules to achieve performance on a metacomputer. Such application-centric schedules promote the performance of the application by evaluating system performance in terms of application resource requirements. To formalize and generalize the, as yet, ad hoc notion of application-centric scheduling emerging from the practices of metacomputing application developers [EMRP, SAR, GWP93], we are developing metacomputing scheduling agents calle...
Scheduling From the Perspective of the Application
, 1996
"... Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metacomputing systems at the sizes of interest to developers and users. In this paper, we focus on the pr ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metacomputing systems at the sizes of interest to developers and users. In this paper, we focus on the problem of scheduling applications on metacomputing systems. We introduce the concept of application-centric scheduling in which everything about the system is evaluated in terms of its impact on the application. Application-centric scheduling is used by virtually all metacomputer programmers to achieve performance on metacomputing systems. We describe two successful metacomputing applications to illustrate this approach, and describe AppLeS scheduling agents which generalize the application-centric scheduling approach. Finally, we show preliminary results which compare AppLeS-derived schedules with conventional strip and blocked schedules for a two-dimensional Jacobi code. 1 Introduction Inc...
The Utility of Exploiting Idle Workstations for Parallel Computation
, 1997
"... In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of k workstations available? This provides an estimate of the oppor ..."
Abstract
-
Cited by 73 (5 self)
- Add to MetaCart
In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of k workstations available? This provides an estimate of the opportunity for parallel computation. Second, how stable is a cluster of free machines and how does the stability vary with the size of the cluster? This indicates how frequently a parallel computation might have to stop for adapting to changes in processor availability. Third, what is the distribution of workstation idle-times? This information is useful for selecting workstations to place computation on. Fourth, how much benefit can a user expect? To state this in concrete terms, if I have a pool of size S, how big a parallel machine should I expect to get for free by harvesting idle machines. Finally, how much benefit can be achieved on a real machine and how hard does a parallel programmer ha...
Managing Checkpoints for Parallel Programs
- In Workshop on Job Scheduling Strategies for Parallel Processing (IPPS '96
"... Checkpointing is a valuable tool for any scheduling system to have. With the ability to checkpoint, schedulers are not locked into a single allocation of resources to jobs, but instead can stop running jobs, and re-allocate resources with out sacrificing any completed computations. Checkpointing tec ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
Checkpointing is a valuable tool for any scheduling system to have. With the ability to checkpoint, schedulers are not locked into a single allocation of resources to jobs, but instead can stop running jobs, and re-allocate resources with out sacrificing any completed computations. Checkpointing techniques are not new, but they have not been widely available on parallel platforms. We have implemented CoCheck, a system for checkpointing message passing parallel programs. Parallel programs tend to be large in terms of their aggregate memory utilization, so the size of their checkpoint is also large. Because of this, checkpoints must be handled carefully to avoid overloading the system when checkpoints take place. Today's distributed file systems do not handle this situation well. We therefore propose the use of checkpoint servers which are specifically designed to move checkpoints from the checkpointing process, across the interconnection network, and on to stable storage. A scheduling s...
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
Adaptive Scheduling of Master/Worker Applications on Distributed Computational Resources
, 2001
"... xvi 1 ..."
MIST: PVM with Transparent Migration and Checkpointing
- In 3rd Annual PVM Users' Group Meeting
, 1995
"... We are currently involved in research to enable PVM to take advantage of shared networks of workstations (NOWs) more effectively. In such a computing environment, it is important to utilize workstations unobtrusively and recover from machine failures. Towards this goal, we have enhanced PVM with tra ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We are currently involved in research to enable PVM to take advantage of shared networks of workstations (NOWs) more effectively. In such a computing environment, it is important to utilize workstations unobtrusively and recover from machine failures. Towards this goal, we have enhanced PVM with transparent task migration, checkpointing, and global scheduling. These enhancements are part of the MIST project which takes an open systems approach in developing a cohesive, distributed parallel computing environment. This open systems approach promotes plug-and-play integration of independently developed modules, such as Condor, DQS, AVS, Prospero, XPVM, PIOUS, Ptools, etc. Transparent task migration, in conjunction with a global scheduler, facilitates the use of shared NOWs by allowing parallel jobs to unobtrusively utilize nodes that are currently unused. PVM tasks can be moved onto nodes that are otherwise idle, and moved off when the node is no longer free. Experiments show that migrati...
PVM and MPI: A comparison of features
- Calculateurs Paralleles
, 1996
"... This paper compares PVM and MPI features, pointing out the situations where one may befavored over the other. Application developers can determine where their application most likely will run and if it requires particular features supplied by only one or the other of the APIs. MPI is expected to be ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
This paper compares PVM and MPI features, pointing out the situations where one may befavored over the other. Application developers can determine where their application most likely will run and if it requires particular features supplied by only one or the other of the APIs. MPI is expected to be faster within a large multiprocessor. It has many more point-to-point and collective communication options than PVM. This can be important ifan algorithm is dependent onthe existence of a special communication option. MPI also has the ability to specify a logical communication topology. PVM is better when applications will be run over heterogeneous networks. It has good interoperability between di erent hosts. PVM allows the development of fault tolerant applications that can survive host or task failures. Because the PVM model is built around the virtual machine concept (not present in the MPI model), it provides a powerful set of dynamic resource manager and process control functions. Each API has its unique strengths and this will remain so into the foreseeable future. One area of future research is to study the feasibility of creating a programming environment that allows access to the virtual machine features of PVM and the message passing features of MPI. 1.
Implementation of Gang-Scheduling on Workstation Cluster
, 1996
"... The goal of this paper is to determine how efficiently we can implement an adequate parallel programming environment on a workstation cluster without modifying the existing operating system. We have implemented a runtime environment for parallel programs and gang-scheduling on a workstation cluster. ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The goal of this paper is to determine how efficiently we can implement an adequate parallel programming environment on a workstation cluster without modifying the existing operating system. We have implemented a runtime environment for parallel programs and gang-scheduling on a workstation cluster. In this paper, we report the techniques used to implement gang-scheduling on a workstation cluster and the problems we faced. The most important technique is "network preemption" and a unique feature of our approach is that the gang-scheduling is also written in a parallel language. Our evaluation shows that gangscheduling on workstation clusters can be practical. 1 Introduction Workstation clusters are gathering attentions to an alternative of parallel machines [1, 2, 13]. If a workstation cluster can be made to imitate a parallel machine, then it would be a cost-effective and familiarto -use parallel execution environment. To prove this, we have implemented a parallel program execution e...

