Results 1 - 10
of
60
Bridging the tenant-provider gap in cloud services
- In ACM Symposium on Cloud Computing. ACM
, 2012
"... The disconnect between the resource-centric interface ex-posed by today’s cloud providers and tenant goals hurts both entities. Tenants are encumbered by having to translate their performance and cost goals into the corresponding resource requirements, while providers suffer revenue loss due to un-i ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
(Show Context)
The disconnect between the resource-centric interface ex-posed by today’s cloud providers and tenant goals hurts both entities. Tenants are encumbered by having to translate their performance and cost goals into the corresponding resource requirements, while providers suffer revenue loss due to un-informed resource selection by tenants. Instead, we argue for a “job-centric ” cloud whereby tenants only specify high-level goals regarding their jobs and applications. To illustrate our ideas, we present Bazaar, a cloud framework offering a job-centric interface for data analytics applications. Bazaar allows tenants to express high-level goals and pre-dicts the resources needed to achieve them. Since multiple resource combinations may achieve the same goal, Bazaar chooses the combination most suitable for the provider. Us-ing large-scale simulations and deployment on a Hadoop cluster, we demonstrate that Bazaar enables a symbiotic tenant-provider relationship. Tenants achieve their perfor-mance goals. At the same time, holistic resource selection benefits providers in the form of increased goodput.
Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud.
- In Proceedings of the 9th International Conference on Autonomic Computing, ICAC ’12,
, 2012
"... ABSTRACT Distributed data processing framework MapReduce is increasingly deployed in Clouds to leverage the pay-per-usage cloud computing model. Popular Hadoop MapReduce environment expects that end users determine the type and amount of Cloud resources for reservation as well as the configuration ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
ABSTRACT Distributed data processing framework MapReduce is increasingly deployed in Clouds to leverage the pay-per-usage cloud computing model. Popular Hadoop MapReduce environment expects that end users determine the type and amount of Cloud resources for reservation as well as the configuration of Hadoop parameters. However, such resource reservation and job provisioning decisions require in-depth knowledge of system internals and laborious but often ineffective parameter tuning. We propose and develop AROMA, a system that automates the allocation of heterogeneous Cloud resources and configuration of Hadoop parameters for achieving quality of service goals while minimizing the incurred cost. It addresses the significant challenge of provisioning ad-hoc jobs that have performance deadlines in Clouds through a novel two-phase machine learning and optimization framework. Its technical core is a support vector machine based performance model that enables the integration of various aspects of resource provisioning and autoconfiguration of Hadoop jobs. It adapts to ad-hoc jobs by robustly matching their resource utilization signature with previously executed jobs and making provisioning decisions accordingly. We implement AROMA as an automated job provisioning system for Hadoop MapReduce hosted in virtualized HP ProLiant blade servers. Experimental results show AROMA's effectiveness in providing performance guarantee of diverse Hadoop benchmark jobs while minimizing the cost of Cloud resource usage.
Automated Profiling and Resource Management of Pig Programs for Meeting Service Level Objectives
"... An increasing number of MapReduce applications associated with live business intelligence require completion time guarantees. In this paper, we consider the popular Pig framework that provides a high-level SQL-like abstraction on top of MapReduce engine for processing large data sets. Programs writt ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
(Show Context)
An increasing number of MapReduce applications associated with live business intelligence require completion time guarantees. In this paper, we consider the popular Pig framework that provides a high-level SQL-like abstraction on top of MapReduce engine for processing large data sets. Programs written in such frameworks are compiled into directed acyclic graphs (DAGs) of MapReduce jobs. There is a lack of performance models and analysis tools for automated performance management of such MapReduce jobs. We offer a performance modeling environment for Pig programs that automatically profiles jobs from the past runs and aims to solve the following inter-related problems: (i) estimating the completion time of a Pig program as a function of allocated resources; (ii) estimating the amount of resources (a number of map and reduce slots) required for completing a Pig program with a given (soft) deadline. For solving these problems, initially, we optimize a Pig program execution by enforcing the optimal schedule of its concurrent jobs. For DAGs with concurrent jobs, this optimization helps reducing the program completion time: 10%-27 % in our experiments. Moreover, it eliminates possible non-determinism of concurrent jobs ’ execution in the Pig program, and therefore, enables a more accurate performance model for Pig programs. We validate our approach using a 66-node Hadoop cluster and a diverse set of workloads: PigMix benchmark, TPC-H queries, and customized queries mining a collection of HP Labs ’ web proxy logs. The proposed scheduling optimization leads to significant resource savings (20%-40 % in our experiments) compared with the original, unoptimized solution, and the predicted program completion times are within 10 % of the measured ones. 1.
Brownout: building more robust cloud applications
"... Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the cloud infrastructure level — for example hardware failures — and at the user workload level — flash crowds. However, robustl ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
(Show Context)
Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the cloud infrastructure level — for example hardware failures — and at the user workload level — flash crowds. However, robustly withstanding extreme variability, requires costly hardware over-provisioning. In this paper, we introduce a self-adaptation programming paradigm called brownout. Using this paradigm, applications can be designed to robustly withstand unpredictable runtime variations, without over-provisioning. The paradigm is based on optional code that can be dynamically deactivated through decisions based on control theory. We modified two popular web application prototypes — RU-BiS and RUBBoS — with less than 170 lines of code, to make them brownout-compliant. Experiments show that brownout self-adaptation dramatically improves the ability to withstand flash-crowds and hardware failures.
Hybridmr: A hierarchical mapreduce scheduler for hybrid data centers
- In ICDCS. IEEE
, 2013
"... Abstract—Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services a ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first
Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters
- in SoCC. ACM
, 2013
"... Abstract This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) exploration and evaluation of smart eviction policies for jobs and for tasks, ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) exploration and evaluation of smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task preemption mechanism for Mapreduce. We incorporated Natjam into the Hadoop YARN scheduler framework (in Hadoop 0.23). We present experiments from deployments on a test cluster, Emulab and a Yahoo! Inc. commercial cluster, using both synthetic workloads as well as Hadoop cluster traces from Yahoo!. Our results reveal that Natjam incurs overheads as low as 7%, and is preferable to existing approaches.
Benchmarking Approach for Designing a MapReduce Performance Model
"... In MapReduce environments, many of the programs are reused for processing a regularly incoming new data. A typical user question is how to estimate the completion time of these programs as a function of a new dataset and the cluster resources. In this work 1, we offer a novel performance evaluation ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
In MapReduce environments, many of the programs are reused for processing a regularly incoming new data. A typical user question is how to estimate the completion time of these programs as a function of a new dataset and the cluster resources. In this work 1, we offer a novel performance evaluation framework for answering this question. We observe that the execution of each map (reduce) tasks consists of specific, well-defined data processing phases. Only map and reduce functions are custom and their executions are user-defined for different MapReduce jobs. The executions of the remaining phases are generic and depend on the amount of data processed by the phase and the performance of underlying Hadoop cluster. First, we design a set of parameterizable microbenchmarks to measure generic phases and to derive a platform performance model of a given Hadoop cluster. Then using the job past executions, we summarize job’s properties and performance of its custom map/reduce functions in a compact job profile. Finally, by combining the knowledge of the job profile and the derived platform performance model, we offer a MapReduce performance model that estimates the program completion time for processing a new dataset. The evaluation study justifies our approach and the proposed framework: we are able to accurately predict performance of the diverse set of twelve MapReduce applications. The predicted completion times for most experiments are within 10 % of the measured ones (with a worst case resulting in 17 % of error) on our 66-node Hadoop cluster.
Projecting Disk Usage Based on Historical Trends in a Cloud Environment
"... Provisioning scarce resources among competing users and jobs remains one of the primary challenges of operating large-scale, distributedcomputingenvironments. Distributed storagesystems, inparticular, typicallyrelyonhardoperatorset quotas to control disk allocation and enforce isolation for space an ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Provisioning scarce resources among competing users and jobs remains one of the primary challenges of operating large-scale, distributedcomputingenvironments. Distributed storagesystems, inparticular, typicallyrelyonhardoperatorset quotas to control disk allocation and enforce isolation for space and I/O bandwidth among disparate users. However, users and operators are very poor at predicting future requirements and, as a result, tend to over-provision grossly. For three years, we collected detailed usage information for data stored in distributed filesystems in a large private cloud spanning dozens of clusters on multiple continents. Specifically, we measured the disk space usage, I/O rate, and age of stored data for thousands of different engineering users and teams. We find that although the individual time series often have non-stable usage trends, regional aggregations, user classification, and ensemble forecasting methods can be combined to provide a more accurate prediction of future use for the majority of users. We applied this methodology for the storage users in one geographic region and back-tested these techniques over the past three years to compare our forecasts against actual usage. We find that by classifying a small subset of users with unforecastable trend changes due to known product launches, we can generate three-month out forecasts with mean absolute errors of less than 12%. This compares favorably to the amount of allocated but unused quota that is generally wasted with manual operator-set quotas.
Revisiting size-based scheduling with estimated job sizes
- in MASCOTS. IEEE
, 2014
"... Abstract—We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size estimates, thus limiting the applicability of size-b ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract—We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size estimates, thus limiting the applicability of size-based schedulers. We show that scheduling performance is tightly connected to workload characteristics: in the absence of large skew in the job size distribution, even extremely imprecise estimates suffice to outperform size-oblivious disciplines. Instead, when job sizes are heavily skewed, known size-based disciplines suffer. In this context, we show – for the first time – the dichotomy of over-estimation versus under-estimation. The former is, in general, less problematic than the latter, as its effects are localized to individual jobs. Instead, under-estimation leads to severe problems that may affect a large number of jobs. We present an approach to mitigate these problems: our tech-nique requires no complex modifications to original scheduling policies and performs very well. To support our claim, we proceed with a simulation-based evaluation that covers an unprecedented large parameter space, which takes into account a variety of synthetic and real workloads. As a consequence, we show that size-based scheduling is practical and outperforms alternatives in a wide array of use-cases, even in presence of inaccurate size information. I.
Slo-driven right-sizing and resource provisioning of mapreduce jobs
- In Workshop on Large Scale Distributed Systems and Middleware (LADIS) in conjunction with VLDB
, 2011
"... There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a MapReduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a MapReduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster. 1.