Results 1 - 10
of
26
Exploiting Process Lifetime Distributions for Dynamic Load Balancing
- ACM Transactions on Computer Systems
, 1996
"... We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration po ..."
Abstract
-
Cited by 290 (30 self)
- Add to MetaCart
We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memorytransfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 -- 50%, compared to non-preemptive migration. 1 Introduction Most systems that perform load balancing use remote execution (i.e. non-preemptive migration) based on a priori knowledge of process behavior, often in the form of a list of process names eligible for migration. Althoug...
Distributed Computing in Practice: The Condor Experience
- Concurrency and Computation: Practice and Experience
, 2005
"... Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 263 (6 self)
- Add to MetaCart
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course traveled by research ideas as they grow into production systems.
A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems
- IEEE Transactions on Software Engineering
, 1988
"... Abstract-One measure of usefulness of a general-purpose distrib-uted computing system is the system’s ability to provide a level of per-formance commensurate to the degree of multiplicity of resources pres-ent in the system. Many different approaches and metrics of performance have been proposed in ..."
Abstract
-
Cited by 223 (0 self)
- Add to MetaCart
Abstract-One measure of usefulness of a general-purpose distrib-uted computing system is the system’s ability to provide a level of per-formance commensurate to the degree of multiplicity of resources pres-ent in the system. Many different approaches and metrics of performance have been proposed in an attempt to achieve this goal in existing systems. In addition, analogous problem formulations exist in other fields such as control theory, operations research, and produc-tion management. However, due to the wide variety of approaches to this problem, it is difficult to meaningfully compare different systems since there is no uniform means for qualitatively or quantitatively eval-uating them. It is difficult to successfully build upon existing work or identify areas worthy of additional effort without some understanding of the relationships between past efforts. In this paper, a taxonomy of approaches to the resource management problem is presented in an attempt to provide a common terminology and classification mecha-nism necessary in addressing this problem. The taxonomy, while pre-sented and discussed in terms of distributed scheduling, is also appli-cable to most types of resource management. As an illustration of the usefulness of the taxonomy an annotated bibliography is given which classifies a large number of distributed scheduling approaches accord-ing to the taxonomy. Index Terms-Distributed operating systems, distributed resource management, general-purpose distributed computing systems, sched-uling, task allocation, taxonomy. T I.
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Condor and the Grid
"... Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 143 (26 self)
- Add to MetaCart
Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must reflect the sociology of communities. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems.
Process migration
- ACM Computing Surveys
, 2000
"... A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations
A Distributed Data-balanced Dictionary Based on the B-link Tree
- In Proceedings of the 6th International Parallel Processing Symposium
, 1992
"... Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing parallel computers. Our distributed B-tree (the dB-tree) r ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing parallel computers. Our distributed B-tree (the dB-tree) replicates the interior nodes in order to improve parallelism and reduce message passing. We show how the dB-tree algorithm can be used to build an efficient, highly parallel, data-balanced distributed dictionary, the dE-tree. Keywords: Concurrent dictionary data structures, Message passing multiprocessor systems, Balanced search trees, B-link trees, Replica coherency. MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-530 c fl Massachusetts Institute of Technology 1992 Adrian Colbrook was supported in part by the National Science Foundation under grant CCR-8716884, by the Defense Advanced Research Projects Agency (DARPA) under Contract N00014-89-J-1988, by an equipment grant...
The Application of Microeconomics to the Design of Resource Allocation and Control Algorithms
, 1989
"... In this thesis, we present a new methodology for resource sharing algorithms in distributed systems. We propose that a distributed computing system should be composed of a decentralized community of microeconomic agents. We show that this approach decreases complexity and can substantially improve ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
In this thesis, we present a new methodology for resource sharing algorithms in distributed systems. We propose that a distributed computing system should be composed of a decentralized community of microeconomic agents. We show that this approach decreases complexity and can substantially improve performance. We compare the performance, generality and complexity of our algorithms with non-economic algorithms. To validate the usefulness of our approach, we present economies that solve three distinct resource management problems encountered in large, distributed systems. The first economy performs CPU load balancing and demonstrates how our approach limits complexity and effectively allocates resources when compared to non-economic algorithms. We show that the economy achieves better performance than a representative non-economic algorithm. The load balancing economy spa...
Automated Learning of Load-Balancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = Random(Possible-destinations) IF Load(s) - Reference(s) > q 1 (s) THEN Send RECEIVER-SIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The load-balancing policy considered in this thesis The sender-side rules are applied by the load-balancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters --- d, q 1 , and q 2 --- take non-negative floating-point values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...

