Results 1 -
6 of
6
Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds
"... Abstract—Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of MapReduce processing and build up a cost function that explicitly models t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract—Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship between the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs with a small number of nodes. Based on this cost model, we can solve a number of decision problems, such as the optimal amount of resources that can minimize the financial cost with a time deadline or minimize the time under certain financial budget. Experimental results show that this cost model performs well on tested MapReduce programs.
Predictable Time-Sharing for DryadLINQ Cluster
"... This paper addresses the scheduling problem that popular data parallel programming systems such as DryadLINQ and MapReduce are facing today. Designing a cluster system in a multi-user environment is challenging because cluster schedulers must satisfy multiple, possibly conflicting, enterprise goals ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper addresses the scheduling problem that popular data parallel programming systems such as DryadLINQ and MapReduce are facing today. Designing a cluster system in a multi-user environment is challenging because cluster schedulers must satisfy multiple, possibly conflicting, enterprise goals and policies. Particularly for these new types of data-intensive applications, it continues to be a challenge to simultaneously achieve both high throughput and predictable end-to-end performance for jobs (e.g., predictable start/end times). The conventional approach of scheduling these types of jobs is to attempt to determine a best mapping between a task and a node before the job executes, and the scheduling system ceases to be involved for a given job once the job starts executing. Instead, as described in this paper, we define a reactive containment and control mechanism for scheduling and executing distributed tasks, schedule the jobs, and then continually monitor and adjust resources as the job executes. More specifically, a DryadLINQ task in our system is contained in virtual machine and distributed controllers regulate progress of the task at runtime. Using online, feedback-controlled VM CPU scheduling, our system provides a job a capability to speed-up or slow-down progress of concurrent sub-tasks so that the job can make predictable progress while sharing system resources with other jobs. The new capability allows an enterprise to enforce flexible scheduling policies such as fair-share and/or prioritizing jobs. Our evaluation results using five well-known DryadLINQ applications show the implemented distributed controllers achieve high throughput as well as predictable end-to-end performance. 1.
Course 7001 Mini Project Performance Evaluation of Hadoop on Virtual Machines
"... MapReduce[1] is a popular programming framework that is intended for automatical paralellization of computation in the cloud. MapReduce deals with data intensive applications; huge amount of data is first loaded from remote DFS, then ..."
Abstract
- Add to MetaCart
MapReduce[1] is a popular programming framework that is intended for automatical paralellization of computation in the cloud. MapReduce deals with data intensive applications; huge amount of data is first loaded from remote DFS, then
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 Preserving Privacy in Distributed Systems
"... Abstract—We present sTile, a technique for distributing trust-needing computation onto insecure networks, while providing probabilistic guarantees that malicious agents that compromise parts of the network cannot learn private data. With sTile, we explore the fundamental cost of achieving privacy th ..."
Abstract
- Add to MetaCart
Abstract—We present sTile, a technique for distributing trust-needing computation onto insecure networks, while providing probabilistic guarantees that malicious agents that compromise parts of the network cannot learn private data. With sTile, we explore the fundamental cost of achieving privacy through data distribution and bound how much less efficient a privacy-preserving system is than a non-private one. While that cost is significant, we find that sTile-based systems execute orders of magnitude faster than homomorphic encryption systems, the alternative promising approach to preserving privacy. This paper focuses specifically on NP-complete problems and demonstrates how sTile-based systems can solve important real-world problems, such as protein folding, image recognition, and resource allocation. We present the algorithms involved in sTile and formally prove that sTile-based systems preserve privacy. We develop a reference sTile-based implementation and empirically evaluate it on several physical networks of varying sizes, including the globally distributed PlanetLab testbed. Our analysis demonstrates sTile’s scalability and ability to handle varying network delay, as well as verifies that problems requiring privacy-preservation can be solved using sTile orders of magnitude faster than using today’s state-of-the-art alternatives. 1
Sigiri: Uniform Abstraction for Large-Scale Compute Resource Interactions
"... Abstract—Scientists who conduct mid-range computationally heavy modeling and analysis often scramble to find sufficient computational resources to test and run their codes. The science they carry out is not petascale or even terascale science but the computational needs often go beyond what can be s ..."
Abstract
- Add to MetaCart
Abstract—Scientists who conduct mid-range computationally heavy modeling and analysis often scramble to find sufficient computational resources to test and run their codes. The science they carry out is not petascale or even terascale science but the computational needs often go beyond what can be satisfied by their university. With the maturation of Grid computing facilities and recent explosion of cloud computing data centers, mid-scale computational science has more options to satisfy computational needs. This paper focuses on a simple abstraction for interaction with heterogeneous resource managers spanning grid and cloud computing, and on features that make the tool useful for the midscale physical or natural scientist. A key aspect of the service is its support for multiple standard job specification languages and the ability for the user to directly interact with the service, removing the delay that can come through layers of services. I.

