Results 1 - 10
of
16
Exploiting Process Lifetime Distributions for Dynamic Load Balancing
- ACM Transactions on Computer Systems
, 1996
"... We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration po ..."
Abstract
-
Cited by 290 (30 self)
- Add to MetaCart
We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memorytransfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 -- 50%, compared to non-preemptive migration. 1 Introduction Most systems that perform load balancing use remote execution (i.e. non-preemptive migration) based on a priori knowledge of process behavior, often in the form of a list of process names eligible for migration. Althoug...
Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems
, 1993
"... ..."
Process migration
- ACM Computing Surveys
, 2000
"... A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
REXEC: A Decentralized, Secure Remote Execution Environment for Parallel and Sequential Programs
- 4th Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing
, 2000
"... Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for cl ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for clusters built using contemporary hardware and operating systems. Implementations are either too old and/or not publicly available, require use of operating systems which are not supported by modern hardware, or simply do not meet the functional requirements demanded by practical use in real world settings. To address these issues, we designed REXEC, a decentralized, secure remote execution facility. It provides high availability, scalability, transparent remote execution, dynamic cluster configuration, decoupled node discovery and selection, a well-defined failure and cleanup model, parallel and distributed program support, and strong authentication and encryption. The system is implemented and is currently installed and in use on a 32-node cluster of 2-way SMPs running the Linux 2.2.5 operating system.
Bypass: A Tool for Building Split Execution Systems
- In Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing
, 2000
"... Split execution is a common model for providing a friendly environment on a foreign machine. In this model, a remotely executing process sends some or all of its system calls back to a home environment for execution. Unfortunately, hand-coding split execution systems for experimentation and research ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Split execution is a common model for providing a friendly environment on a foreign machine. In this model, a remotely executing process sends some or all of its system calls back to a home environment for execution. Unfortunately, hand-coding split execution systems for experimentation and research is difficult and error-prone. We have built a tool, Bypass, for quickly producing portable and correct split execution systems for unmodified legacy applications. We demonstrate Bypass by using it to transparently connect a POSIX application to a simple data staging system based on the Globus toolkit. 1. Introduction The split execution model allows a process running on a foreign machine to behave as if it were running on its home machine. Split execution generally involves three software components: an application, an agent, and a shadow. Figure 1 shows these components. Kernel Agent Application Local System Calls Calls System Trapped Kernel Shadow Local System Calls Other...
Market-based Cluster Resource Management
, 2001
"... Resource management in high-performance, cluster computer systems is a challenging problem. Resources must be allocated amongst competing applications of varying levels of importance, and aggregate resource demand needs to be controlled to keep the system in a comfortable regime of operation. Effect ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Resource management in high-performance, cluster computer systems is a challenging problem. Resources must be allocated amongst competing applications of varying levels of importance, and aggregate resource demand needs to be controlled to keep the system in a comfortable regime of operation. Effectively performing these tasks requires knowledge of user valuations of the resources being allocated and having a feedback signal that causes users to back off the system when it is overloaded. Unfortunately, current approaches to cluster resource management provide little, if any, means for users to express resource valuations and to inuence their resource allocations. In addition, while feedback signals are provided, there are no associated incentives for users to pay attention to and respond to them. As a result, traditional systems are incapable of delivering the maximum possible value to users. The thesis of this work is that...
MARS: Adaptive scheduling of parallel applications in a multi-user heterogeneous environment
, 1996
"... This paper presents a multithreaded system used to schedule parallel applications on heterogeneous multiuser parallel architectures. The approach is based on idle cycles stealing, and on adaptive parallelism to dynamically adjust the parallelism degree with respect to the system load. The basic meca ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper presents a multithreaded system used to schedule parallel applications on heterogeneous multiuser parallel architectures. The approach is based on idle cycles stealing, and on adaptive parallelism to dynamically adjust the parallelism degree with respect to the system load. The basic mecanisms used are thread migration and machine global load estimation. 1 The ESPACE project The ESPACE project (Execution Support for Parallel Applications in high-performance Computing Environments) aims to provide a full environment for highly parallel application programming. One goal of the ESPACE project is to design an efficient low level parallel environment with a computation model based on lightweight processes (threads). Using threads for supporting high computing parallel applications is a recent approach [1] which allows efficient management of a great number of activities inside applications. PM 2 (Parallel Multithreaded Machine) [2] is a preemptive multithreaded run-time syste...
Competitive Execution in a Distributed Environment
, 1996
"... of the Dissertation Competitive Execution in a Distributed Environment by Sung Hyun Cho Doctor of Philosophy in Computer Science University of California, Los Angeles, 1996 Professor David R. Jefferson, Chair We propose an alternative to process migration, called competition, to speed up distribut ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
of the Dissertation Competitive Execution in a Distributed Environment by Sung Hyun Cho Doctor of Philosophy in Computer Science University of California, Los Angeles, 1996 Professor David R. Jefferson, Chair We propose an alternative to process migration, called competition, to speed up distributed programs in the background on a network of variable-speed processors. Competition protocols are transparent operating system facilities that involve creating multiple instances (called clones) p 1 , p 2 , etc. of a process P on different variable-speed processors, and making clones "compete", i.e., attempting to guarantee that the output of the clone that is farthest "ahead" is fed to the rest of the computation, and that the entire application's performance tracks that of the clone which is farthest ahead. One clone may be ahead of or behind others depending on the current foreground loads. If for any reason there is variation in the progress of the clones, so that one clone is ahead at ...
Just-in-time Transparent Resource Management in Distributed Systems
, 1998
"... This paper presents the design and the implementation of a resource management system for monitoring computing resources on a network and for dynamically allocating them to concurrently executing jobs. In particular, it is designed to support adaptive parallel computations---computations that benefi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents the design and the implementation of a resource management system for monitoring computing resources on a network and for dynamically allocating them to concurrently executing jobs. In particular, it is designed to support adaptive parallel computations---computations that benefit from addition of new machines, and can tolerate removal of machines while executing. The challenge for such a resource manager is to communicate the availability of resources to running programs even when the programs were not developed to work with external resource managers. Our main contribution is a novel mechanism addressing this issue, built on low-level features common to popular parallel programming systems. Existing resource management systems for adaptive computations either require tight integration with the operating system (DRMS), or require an integration with a programming system that is aware of external resource managers (e.g. Condor/CARMI, MPVM, Piranha). Thus in each cas...

