Results 1 - 10
of
56
Information and Control in Gray-Box Systems
- SOSP'01, BANFF, CANADA
, 2001
"... In modern systems, developers are often unable to modify the underlying operating system. To build services in such an environment, we advocate the use of gray-box techniques. When treating ..."
Abstract
-
Cited by 98 (21 self)
- Add to MetaCart
In modern systems, developers are often unable to modify the underlying operating system. To build services in such an environment, we advocate the use of gray-box techniques. When treating
Load Balancing and Unbalancing for Power and Performance in Cluster-Based Systems
, 2001
"... In this paper we address power conservation for clusters of workstations or PCs. Our approach is to develop systems that dynamically turn cluster nodes on -- to be able to handle the load imposed on the system efficiently -- and off -- to save power under lighter load. The key component of our syst ..."
Abstract
-
Cited by 87 (7 self)
- Add to MetaCart
In this paper we address power conservation for clusters of workstations or PCs. Our approach is to develop systems that dynamically turn cluster nodes on -- to be able to handle the load imposed on the system efficiently -- and off -- to save power under lighter load. The key component of our systems is an algorithm that makes load balancing and unbalancing decisions by considering both the total load imposed on the cluster and the power and performance implications of turning nodes off. The algorithm is implemented in two different ways: (1) at the application level for a cluster-based, localityconscious network server; and (2) at the operating system level for an operating system for clustered cycle servers. Our experimental results are very favorable, showing that our systems conserve both power and energy in comparison to traditional systems.
Scheduling with Implicit Information in Distributed Systems
- In Proceedings of the 1998 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems
, 1998
"... Implicit coscheduling is a distributed algorithm for time-sharing communicating processes in a cluster of workstations. By observing and reacting to implicit information, local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating processes. ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
Implicit coscheduling is a distributed algorithm for time-sharing communicating processes in a cluster of workstations. By observing and reacting to implicit information, local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating processes. The principal mechanism involved is two-phase spin-blocking: a process waiting for a message response spins for some amount of time, and then relinquishes the processor if the response does not arrive. In this paper, we describe our experience implementing implicit coscheduling on a cluster of 16 UltraSPARC I workstations; this has led to contributions in three main areas. First, we more rigorously analyze the two-phase spin-block algorithm and show that spin time should be increased when a process is receiving messages. Second, we present performance measurements for a wide range of synthetic benchmarks and for seven Split-C parallel applications. Finally, we show how implicit coscheduling ...
Characterizing and Evaluating Desktop Grids: An Empirical Study
- In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’04
, 2004
"... Desktop resources are attractive for running computeintensive distributed applications. Several systems that aggregate these resources in desktop grids have been developed. While these systems have been successfully used for many high throughput applications there has been little insight into the de ..."
Abstract
-
Cited by 58 (12 self)
- Add to MetaCart
Desktop resources are attractive for running computeintensive distributed applications. Several systems that aggregate these resources in desktop grids have been developed. While these systems have been successfully used for many high throughput applications there has been little insight into the detailed temporal structure of CPU availability of desktop grid resources. Yet, this structure is critical to characterize the utility of desktop grid platforms for both task parallel and even data parallel applications. We address the following questions: (i) What are the temporal characteristics of desktop CPU availability in an enterprise setting? (ii) How do these characteristics affect the utility of desktop grids? (iii) Based on these characteristics, can we construct a model of server "equivalents" for the desktop grids, which can be used to predict application performance ? We present measurements of an enterprise desktop grid with over 220 hosts running the Entropia commercial desktop grid software. We utilize these measurements to characterize CPU availability and develop a performance model for desktop grid applications for various task granularities, showing that there is an optimal task size. We then use a cluster equivalence metric to quantify the utility of the desktop grid relative to that of a dedicated cluster.
Market-based Proportional Resource Sharing for Clusters
, 1999
"... Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource ma ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource management. In particular, in systems where the aggregate demand for computing resources can exceed the aggregate supply, how to allocate resources amongst competing applications is an important problem. Traditional solutions to this problem have focused mainly on global optimization with respect to system-centric performance metrics, metrics which ignore higher level user intent. In this paper, we propose an alternative market-based approach based on the notion of a computational economy which optimizes for user value. Starting with fundamental requirements, we describe an abstract architecture for market-based cluster resource management based on the idea of proportional resource sharing of...
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
Dynamic Cluster Reconfiguration For Power And Performance
, 2002
"... In this paper we address power conservation for clusters of workstations or PCs. Our approach is to develop systems that dynamically turn cluster nodes on -- to be able to handle the load imposed on the system efficiently -- and off -- to save power under lighter load. The key component of our syste ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
In this paper we address power conservation for clusters of workstations or PCs. Our approach is to develop systems that dynamically turn cluster nodes on -- to be able to handle the load imposed on the system efficiently -- and off -- to save power under lighter load. The key component of our systems is an algorithm that makes cluster reconfiguration decisions by considering the total load imposed on the system and the power and performance implications of changing the current configuration. The algorithm is implemented in two common cluster-based systems: a network server and an operating system for clustered cycle servers. Our experimental results are very favorable, showing that our systems conserve both power and energy in comparison to traditional systems.
Vsched: Mixing batch and interactive virtual machines using periodic real-time scheduling
- In Proceedings of ACM/IEEE SC 2005 (Supercomputing
, 2005
"... We are developing Virtuoso, a system for distributed computing using virtual machines (VMs). Virtuoso must be able to mix batch and interactive VMs on the same physical hardware, while satisfying constraints on responsiveness and compute rates for each workload. VSched is the component of Virtuoso t ..."
Abstract
-
Cited by 37 (13 self)
- Add to MetaCart
We are developing Virtuoso, a system for distributed computing using virtual machines (VMs). Virtuoso must be able to mix batch and interactive VMs on the same physical hardware, while satisfying constraints on responsiveness and compute rates for each workload. VSched is the component of Virtuoso that provides this capability. VSched is an entirely user-level tool that interacts with the stock Linux kernel running below any type-II virtual machine monitor to schedule all VMs (indeed, any process) using a periodic real-time scheduling model. This abstraction allows compute rate and responsiveness constraints to be straightforwardly described using a period and a slice within the period, and it allows for fast and simple admission control. This paper makes the case for periodic real-time scheduling for VM-based computing environments, and then describes and evaluates VSched. It also applies VSched to scheduling parallel workloads, showing that it can help a BSP application maintain a fixed stable performance despite externally caused load imbalance.
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
- 2004 ACM/IEEE conference on Supercomputing
, 2004
"... Desktop grids are popular platforms for high throughput applications, but due their inherent resource volatility it is difficult to exploit them for applications that require rapid turnaround. Efficient desktop grid execution of short-lived applications is an attractive proposition and we claim that ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Desktop grids are popular platforms for high throughput applications, but due their inherent resource volatility it is difficult to exploit them for applications that require rapid turnaround. Efficient desktop grid execution of short-lived applications is an attractive proposition and we claim that it is achievable via intelligent resource selection. We propose three general techniques for resource selection: resource prioritization, resource exclusion, and task duplication. We use these techniques to instantiate several scheduling heuristics. We evaluate these heuristics through trace-driven simulations of four representative desktop grid configurations. We find that ranking desktop resources according to their clock rates, without taking into account their availability history, is surprisingly effective in practice. Our main result is that a heuristic that uses the appropriate combination of resource prioritization, resource exclusion, and task replication achieves performance within a factor of 1.7 of optimal.
A batch scheduler with high level components
- In Cluster computing and Grid 2005 (CCGrid05
, 2005
"... In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting languag ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness. 1.

