Results 1 -
8 of
8
Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds
"... Abstract—Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of MapReduce processing and build up a cost function that explicitly models t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract—Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship between the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs with a small number of nodes. Based on this cost model, we can solve a number of decision problems, such as the optimal amount of resources that can minimize the financial cost with a time deadline or minimize the time under certain financial budget. Experimental results show that this cost model performs well on tested MapReduce programs.
A Survey on Cloud Computing
"... Cloud computing provides customers the illusion of infinite computing resources which are available from anywhere, anytime, on demand. Computing at such an immense scale requires a framework that can support extremely large datasets housed on clusters of commodity hardware. Two examples of such fram ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Cloud computing provides customers the illusion of infinite computing resources which are available from anywhere, anytime, on demand. Computing at such an immense scale requires a framework that can support extremely large datasets housed on clusters of commodity hardware. Two examples of such frameworks are Google’s MapReduce and Microsoft’s Dryad. First we discuss implementation details of these frameworks and drawbacks where future work is required. Next we discuss the challenges of computing at such a large scale. In particular, we focus on the security issues which arise in the cloud: the confidentiality of data, the retrievability and availability of data, and issues surrounding the correctness and confidentiality of computation executing on third party hardware. 1.
A MapReduce-supported Data Center Networking Topology
"... Abstract—Several novel data center networking (DCN) topologies have been proposed to improve the topological properties of data centers. Unfortunately, it is ignored that whether these topologies are suited for the online applications and infrastructure services running on the corresponding data cen ..."
Abstract
- Add to MetaCart
Abstract—Several novel data center networking (DCN) topologies have been proposed to improve the topological properties of data centers. Unfortunately, it is ignored that whether these topologies are suited for the online applications and infrastructure services running on the corresponding data centers. In this paper, we propose a novel DCN topology, named Hyper- Fat-tree Network (HFN). HFN incarnates the good characteristics of the BCube and Fat-tree topologies, and hence naturally supports the distributed data processing application MapReduce. We then address several challenging issues facing HFN to support MapReduce. Through analysis and simulations, we show that HFN possesses excellent properties and is a viable toplogy for MapReduce. 1.
Microsoft Bing
, 2010
"... Experience from an operational map-reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include (i) machine characteristics- both hardware reliability (e.g., disk failures) as well as run-time contention for processor, memory and other resources, (ii) ne ..."
Abstract
- Add to MetaCart
Experience from an operational map-reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include (i) machine characteristics- both hardware reliability (e.g., disk failures) as well as run-time contention for processor, memory and other resources, (ii) network characteristics with varying bandwidths and congestion along paths, and (iii) imbalance in workload among tasks. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri’s strategies include smart restart of outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Mantri’s principled strategy of dealing with outliers is a significant advancement over prior work that concentrate only on duplicating tasks. Using real-time progress reports, Mantri detects outliers early in their lifetime, and takes appropriate action based on their causes. Early action frees up resources that can be used by subsequent tasks and expedites the job overall. Deployment in Bing’s production cluster and extensive trace-driven simulation indicate that Mantri is 3.1x more effective than the existing state-of-the-art in improving job completion times.
Enhancement of Xen’s Scheduler for MapReduce
"... As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon’s EC2 rely on virtual machines (VM) to host MapReduce clusters in order to process large amounts of data with off-the-shelf systems ..."
Abstract
- Add to MetaCart
As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon’s EC2 rely on virtual machines (VM) to host MapReduce clusters in order to process large amounts of data with off-the-shelf systems. However, current VM scheduling does not provide adequate support for MapReduce workloads, resulting in degraded overall performance. For example, when multiple MapReduce clusters run on a single physical machine, the existing VMM scheduler does not guarantee fairness across clusters. In this work, we present the MapReduce Group Scheduler (MRG). The MRG scheduler implements three mechanisms to improve the efficiency and fairness of the existing VMM scheduler. First, the characteristics of MapReduce workloads facilitate batching of I/O
Scaling MapReduce Applications across Hybrid Clouds to Meet Soft Deadlines
"... Abstract—Cloud platforms make available a virtually infinite amount of computing resources, which are managed by third parties and are accessed by users on demand in a pay-peruse manner, with Quality of Service guarantees. This enables computing infrastructures to be scaled up and down accordingly t ..."
Abstract
- Add to MetaCart
Abstract—Cloud platforms make available a virtually infinite amount of computing resources, which are managed by third parties and are accessed by users on demand in a pay-peruse manner, with Quality of Service guarantees. This enables computing infrastructures to be scaled up and down accordingly to the amount of data to be processed. MapReduce is among the most popular models for development of Cloud applications. As the utilization of such programming model spreads across multiple application domains, the need for timely execution of these applications arises. While existing approaches focus in meeting deadlines via admission control or preemption of lower priority applications, we propose a policy for dynamic provisioning of Cloud resources to speed up execution of deadline-constrained MapReduce applications, by enabling concurrent execution of tasks, in order to meet a deadline for completion of the Map phase of the application. We describe the proposed algorithm and an actual implementation of it in the Aneka Cloud Platform. Experiments on such prototype implementation show that our proposed approach can effectively meet the soft deadlines while minimizing the budget for application execution. I.
Engineering
"... The primary challenge of cloud service providers is finding ways to maintain a high degree of Quality of Service (QoS) in a cost-effective manner to ensure either profitability (for business-based cloud service providers) or cost avoidance (for government cloud service providers). The traditional ap ..."
Abstract
- Add to MetaCart
The primary challenge of cloud service providers is finding ways to maintain a high degree of Quality of Service (QoS) in a cost-effective manner to ensure either profitability (for business-based cloud service providers) or cost avoidance (for government cloud service providers). The traditional approach to improving system performance is to upgrade the servers and/or network backbone, an expensive undertaking. The authors used OPNET Modeler to represent distributed system architecture supporting a variety of application services and defined a framework for measuring QoS from the end-user’s perspective and discovered that there is no direct relationship between server/network upgrades and overall QoS in distributed systems. This framework can be used as a decision support tool for cloud service providers to optimize the QoS of their systems by choosing upgrade strategies that provide the greatest “bang for the buck.”

