Results 1 - 10
of
16
Exploiting Process Lifetime Distributions for Dynamic Load Balancing
- ACM Transactions on Computer Systems
, 1996
"... We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration po ..."
Abstract
-
Cited by 290 (30 self)
- Add to MetaCart
We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memorytransfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 -- 50%, compared to non-preemptive migration. 1 Introduction Most systems that perform load balancing use remote execution (i.e. non-preemptive migration) based on a priori knowledge of process behavior, often in the form of a list of process names eligible for migration. Althoug...
Methodical Analysis of Adaptive Load Sharing Algorithms
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1992
"... This paper presents a method for qualitative and quantitative analysis of load sharing algorithms, using a number of well known examples as illustration. Algorithm design choices are considered with respect to the main activities of information dissemination and allocation decision making. We argue ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
This paper presents a method for qualitative and quantitative analysis of load sharing algorithms, using a number of well known examples as illustration. Algorithm design choices are considered with respect to the main activities of information dissemination and allocation decision making. We argue that nodes must be capable of making local decisions, and for this efficient state dissemination techniques are necessary. Activities related to remote execution should be bounded and restricted to a small proportion of the activity in the system. The quantitative analysis provides both performance and efficiency measures, including consideration of the load and delay characteristics of the environment. To assess stability, which is also a precondition for scalability,we introduce and measure load sharing hit-ratio, the ratio of remote execution requests concluded successfully. Using our analysis method, we are able to suggest improvements to some published algorithms.
On Runtime Parallel Scheduling for Processor Load Balancing
- IEEE Trans. Parallel and Distributed Systems
, 1997
"... Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides high-quality load balancing. This paper pre ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality 1. Introduction Static scheduling balances the workload before runtime and can be applied to problems with a predictable structure, which are called static problems. Dynamic scheduling performs scheduling activities concurrently at runtime, which applies to problems with an unpredictable structure, which are called dynamic problems. Static scheduling utilizes the knowledge of problem characteristics to reach a well-balanced load [1, 2, 3, 4]. However, it is not able to balance the load for dynami...
Design And Evaluation Of Effective Load Sharing In Distributed Real-Time Systems
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In a distributed real-time system, uneven task arrivals temporarily overload some nodes while leaving others idle or underloaded. Consequently, some tasks may miss their deadlines even if the overall system has the capacity to meet the deadlines of all tasks. An effective load sharing (LS) scheme is ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In a distributed real-time system, uneven task arrivals temporarily overload some nodes while leaving others idle or underloaded. Consequently, some tasks may miss their deadlines even if the overall system has the capacity to meet the deadlines of all tasks. An effective load sharing (LS) scheme is proposed as a solution to this problem. Upon arrival of a task at a node, the node determines whether or not the node can complete the task in time under the minimum--laxity--first--served policy. If the task cannot be guaranteed or if guarantees of some other tasks are to be violated due to the addition of this task to the existing schedule, the node looks up the list of loss--minimizing decisions , and determines the best node among a set of nodes in its physical proximity, called its buddy set , to which the task(s) may be transferred. This list of decisions is periodically updated using Bayesian decision analysis and prior/posterior state distributions. These probability distributions a...
Processor Sharing For Cooperative Multi-Task Applications
, 1991
"... by Karen Marie Tracey A processor sharing system allows busy users in a networked environment to take advantage of the processing power of idle machines. Experimental systems have demonstrated the usefulness of the concept, but processor sharing has yet to achieve widespread acceptance. Simplistic s ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
by Karen Marie Tracey A processor sharing system allows busy users in a networked environment to take advantage of the processing power of idle machines. Experimental systems have demonstrated the usefulness of the concept, but processor sharing has yet to achieve widespread acceptance. Simplistic sharing policies and programming difficulty are two factors that combine to limit the use and acceptance of processor sharing. Group management, a new approach to processor sharing presented here, addresses these problems. Group management is designed to support processor sharing applications that consist of multiple cooperating tasks. This new approach provides a framework for the development of sophisticated sharing policies that can support a variety of applications with differing execution characteristics. In addition, it provides services to programmers that ease the task of writing programs for, and running programs in, a processor sharing environment. This dissertation introduces group...
Symmetrical Hopping: A Scalable Scheduling Algorithm for Irregular Problems
- Practice and Experience
, 1995
"... A runtime support is necessary for parallel computations with irregular and dynamic structures. One important component in the support system is the runtime scheduler which balances the working load in the system. We present a new algorithm, Symmetrical Hopping, for dynamic scheduling of ultra-light ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
A runtime support is necessary for parallel computations with irregular and dynamic structures. One important component in the support system is the runtime scheduler which balances the working load in the system. We present a new algorithm, Symmetrical Hopping, for dynamic scheduling of ultra-lightweight processes. It is a dynamic, distributed, adaptive, and scalable scheduling algorithm. This algorithm is described and compared to four other algorithms that have been proposed in this context, namely the randomized allocation, the sender-initiated scheduling, the receiver-initiated scheduling, and the gradient model. The performance of these algorithms on Intel Touchstone Delta is presented. The experimental results show that the Symmetrical Hopping algorithm achieves much better performance due to its adaptiveness. 1. Introduction Large distributed memory parallel machines are becoming increasingly available. To efficiently use such large machines to solve an application problem, th...
Scalable Load-Sharing for Distributed Systems
- In HICSS26
, 1993
"... Adaptive algorithms for load-sharing usually comprise two basic functions: state information dissemination and decision making (control). This paper describes a flexible load-sharing algorithm, FLS, which includes a third function introduced for scalability purposes, that of partitioning into domain ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Adaptive algorithms for load-sharing usually comprise two basic functions: state information dissemination and decision making (control). This paper describes a flexible load-sharing algorithm, FLS, which includes a third function introduced for scalability purposes, that of partitioning into domains. The system partitioning function at a node is responsible for the selection of other nodes to be included in its domain. The state of other nodes in its domain is held locally, in a cache. Cached data is treated as hints for decision making. The FLS algorithm permits local decisions to be made, aims at minimising the number of incorrect decisions and does not allow erroneous decisions to proceed. The algorithm is analysed and shown to be stable and scalable. Its suitability to a CONIC/REX environment is demonstrated with a prototype implementation, providing an automatic software allocation service as part of configuration management. 1. Introduction Distributed systems consist of multipl...
Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers
- Scientific Programming
, 1994
"... One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The Adaptive Contracting Within Neighborhood (ACWN) is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes, and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adapti...
On Runtime Parallel Scheduling
, 1995
"... Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate together to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides a high-quality load balancing. Thi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate together to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides a high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Particular scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality at runtime. Communication costs are significantly reduced compared to other existing algorithms. 1. Introduction One of the challenges in programming parallel machines is to schedule work to processors [18]. There are two types of application problem structures: problems with a predictable structure, also called static problems, and problems with an unpredictable structure, called dynamic problems. Solving dynamic problems is difficult since the number of tasks and the grain...
Incorporation Of Optimal Timeouts Into Distributed Real-Time Load Sharing
- in Proceedings of Hawaii International Conference on System Sciences
, 1993
"... This paper addresses the problem of designing and incorporating a timeout mechanism into load sharing (LS) with state-region change broadcasts in the presence of node failures in a distributed real-time system. Failure of a node is diagnosed by the other nodes through communication timeouts, and the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper addresses the problem of designing and incorporating a timeout mechanism into load sharing (LS) with state-region change broadcasts in the presence of node failures in a distributed real-time system. Failure of a node is diagnosed by the other nodes through communication timeouts, and the timeout period used to diagnose whether a node is faulty or not usually depends on the dynamic changes in system load, the task attributes at the node, and the state the node was initially in. We formulate the problem of determining the `best' timeout period T !i? out for node i as a hypothesis testing problem, and maximize the probability of detecting node failures subject to a pre-specified probability of falsely diagnosing a healthy node as faulty. The parameters needed for the calculation of T !i? out are estimated on-line by node i using the Bayesian technique and are piggy-backed in its region-change broadcasts. The broadcast information is then used to determine T !i? out . If nod...

