Results 1 - 10
of
146
Mesos: A platform for fine-grained resource sharing in the data center
, 2010
"... We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI 1. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, allowing frameworks to achieve ..."
Abstract
-
Cited by 160 (23 self)
- Add to MetaCart
(Show Context)
We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI 1. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, allowing frameworks to achieve data locality by taking turns reading data stored on each machine. To support the sophisticated schedulers of today’s frameworks, Mesos introduces a distributed two-level scheduling mechanism called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to run on them. Our experimental results show that Mesos can achieve near-optimal locality when sharing the cluster among diverse frameworks, can scale up to 50,000 nodes, and is resilient to node failures.
Omega: flexible, scalable schedulers for large compute clusters
- In Proc. EuroSys
, 2013
"... Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We presen ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
(Show Context)
Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control. We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach – all driven by real-life Google production workloads. Categories and Subject Descriptors D.4.7 [Operating Systems]: Organization and Design—Distributed systems; K.6.4 [Management of computing and information systems]:
Multi-Resource Fair Queueing for Packet Processing
, 2012
"... Middleboxes are ubiquitous in today’s networks and perform a variety of important functions, including IDS, VPN, firewalling, and WAN optimization. These functions differ vastly in their requirements for hardware resources (e.g., CPU cycles and memory bandwidth). Thus, depending on the functions the ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
Middleboxes are ubiquitous in today’s networks and perform a variety of important functions, including IDS, VPN, firewalling, and WAN optimization. These functions differ vastly in their requirements for hardware resources (e.g., CPU cycles and memory bandwidth). Thus, depending on the functions they go through, different flows can consume different amounts of a middlebox’s resources. While there is much literature on weighted fair sharing of link bandwidth to isolate flows, it is unclear how to schedule multiple resources in a middlebox to achieve similar guarantees. In this paper, we analyze several natural packet scheduling algorithms for multiple resources and show that they have undesirable properties. We propose a new algorithm, Dominant Resource Fair Queuing (DRFQ), that retains the attractive properties that fair sharing provides for one resource. In doing so, we generalize the concept of virtual time in classical fair queuing to multi-resource settings. The resulting algorithm is also applicable in other contexts where several resources need to be multiplexed in the time domain.
Performance isolation and fairness for multi-tenant cloud storage
- In OSDI
, 2012
"... Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable per ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
(Show Context)
Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable performance and violating SLAs. This paper presents Pisces, a system for achieving datacenter-wide per-tenant performance isolation and fairness in shared key-value storage. Today’s approaches for multi-tenant resource allocation are based either on per-VM allocations or hard rate limits that assume uniform workloads to achieve high utilization. Pisces achieves per-tenant weighted fair shares (or minimal rates) of the aggregate resources of the shared service, even when different tenants ’ partitions are co-located and when demand for different partitions is skewed, time-varying, or bottlenecked by different server resources. Pisces does so by decomposing the fair sharing problem into a combination of four complementary mechanisms—partition placement, weight allocation, replica selection, and weighted fair queuing—that operate on different time-scales and combine to provide system-wide max-min fairness. An evaluation of our Pisces storage prototype achieves nearly ideal (0.99 Min-Max Ratio) weighted fair sharing, strong performance isolation, and robustness to skew and shifts in tenant demand. These properties are achieved with minimal overhead (<3%), even when running at high utilization (more than 400,000 requests/second/server for 10B requests). 1.
Paragon: Qos-aware scheduling for heterogeneous datacenters
- In Proceedings of the eighteenth international
, 2013
"... Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91 % of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11 % and 3 % of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.
Multi-resource allocation: Fairness-efficiency tradeoffs in a unifying framework
- in Proc. IEEE INFOCOM
, 2012
"... Abstract—Quantifying the notion of fairness is under-explored when there are multiple types of resources and users request different ratios of the different resources. A typical example is datacenters processing jobs with heterogeneous resource requirements on CPU, memory, network, bandwidth, etc. T ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Quantifying the notion of fairness is under-explored when there are multiple types of resources and users request different ratios of the different resources. A typical example is datacenters processing jobs with heterogeneous resource requirements on CPU, memory, network, bandwidth, etc. This paper develops a unifying framework addressing the fairnessefficiency tradeoff in light of multiple types of resources. We develop two families of fairness functions that provide different tradeoffs, characterize the effect of user requests ’ heterogeneity, and prove conditions under which these fairness measures satisfy the Pareto efficiency, sharing incentive, and envy-free properties. Intuitions behind the analysis are explained in two visualizations of multi-resource allocation. We also investigate people’s fairness perceptions through an online survey of allocation preferences and provide a brief overview of related work on fairness.
No justified complaints: On fair sharing of multiple resources.
- In Proc. of 3rd ITCS,
, 2012
"... ABSTRACT Fair allocation has been studied intensively in both economics and computer science. This subject has aroused renewed interest with the advent of virtualization and cloud computing. Prior work has typically focused on mechanisms for fair sharing of a single resource. We consider a variant ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Fair allocation has been studied intensively in both economics and computer science. This subject has aroused renewed interest with the advent of virtualization and cloud computing. Prior work has typically focused on mechanisms for fair sharing of a single resource. We consider a variant where each user is entitled to a certain fraction of the system's resources, and has a fixed usage profile describing how much he would want from each resource. We provide a new definition for the simultaneous fair allocation of multiple continuously-divisible resources that we call bottleneck-based fairness (BBF). Roughly speaking, an allocation of resources is considered fair if every user either gets all the resources he wishes for, or else gets at least his entitlement on some bottleneck resource, and therefore cannot complain about not receiving more. We show that BBF has several desirable properties such as providing an incentive for sharing, and also promotes high overall utilization of resources; we also compare BBF carefully to another notion of fairness proposed recently, dominant resource fairness. Our main technical result is that a fair allocation can be found for every combination of user requests and entitlements. The allocation profile of each user is proportionate to the user's profile of requests. The main problem is that the bottleneck resources are not known in advance, and indeed one can find instances that allow different solutions with different sets of bottlenecks. Therefore known techniques such as linear programming do not seem to work. Our proof uses tools from the theory of ordinary differential equations, showing the existence of a sequence of points that converge upon a solution. It is constructive and provides a practical method to compute the allocations numerically.
Fair allocation without trade
- In Proceedings of the 11th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS
"... We consider the age-old problem of allocating items among different agents in a way that is efficient and fair. Two papers, by Dolev et al. and Ghodsi et al., have recently studied this problem in the context of computer systems. Both papers had similar models for agent preferences, but advocated di ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
(Show Context)
We consider the age-old problem of allocating items among different agents in a way that is efficient and fair. Two papers, by Dolev et al. and Ghodsi et al., have recently studied this problem in the context of computer systems. Both papers had similar models for agent preferences, but advocated different notions of fairness. We formalize both fairness notions in economic terms, extending them to apply to a larger family of utilities. Noting that in settings with such utilities efficiency is easily achieved in multiple ways, we study notions of fairness as criteria for choosing between different efficient allocations. Our technical results are algorithms for finding fair allocations corresponding to two fairness notions: Regarding the notion suggested by Ghodsi et al., we present a polynomialtime algorithm that computes an allocation for a general class of fairness notions, in which their notion is included. For the other, suggested by Dolev et al., we show that a competitive market equilibrium achieves the desired notion of fairness, thereby obtaining a polynomial-time algorithm that computes such a fair allocation and solving the main open problem raised by Dolev et al.
Heterogeneity-Aware Resource Allocation and Scheduling
- University of California, Berkeley
"... Data analytics are key applications running in the cloud computing environment. To improve performance and cost-effectiveness of a data analytics cluster in the cloud, the data analytics system should account for het-erogeneity of the environment and workloads. In addi-tion, it also needs to provide ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Data analytics are key applications running in the cloud computing environment. To improve performance and cost-effectiveness of a data analytics cluster in the cloud, the data analytics system should account for het-erogeneity of the environment and workloads. In addi-tion, it also needs to provide fairness among jobs when multiple jobs share the cluster. In this paper, we rethink resource allocation and job scheduling on a data analyt-ics system in the cloud to embrace the heterogeneity of the underlying platforms and workloads. To that end, we suggest an architecture to allocate resources to a data an-alytics cluster in the cloud, and propose a metric of share in a heterogeneous cluster to realize a scheduling scheme that achieves high performance and fairness. 1