@MISC{_cast:tiering, author = {}, title = {CAST: Tiering Storage for Data Analytics in the}, year = {} }
Share
OpenURL
Abstract
Enterprises are increasingly moving their big data analytics to the cloud with the goal of reducing costs without sacrific-ing application performance. Cloud service providers offer their tenants a myriad of storage options, which while flex-ible, makes the choice of storage deployment non trivial. Crafting deployment scenarios to leverage these choices in a cost-effective manner — under the unique pricing mod-els and multi-tenancy dynamics of the cloud environment — presents unique challenges in designing cloud-based data analytics frameworks. In this paper, we proposeCast, a Cloud Analytics Storage Tiering solution that cloud tenants can use to reduce mon-etary cost and improve performance of analytics workloads. The approach takes the first step towards providing stor-age tiering support for data analytics in the cloud. Cast performs offline workload profiling to construct job perfor-mance prediction models on different cloud storage services, and combines these models with workload specifications and high-level tenant goals to generate a cost-effective data place-ment and storage provisioning plan. Furthermore, we build Cast++ to enhance Cast’s optimization model by incorpo-rating data reuse patterns and across-jobs interdependencies common in realistic analytics workloads. Tests with produc-tion workload traces from Facebook and a 400-core Google Cloud based Hadoop cluster demonstrate thatCast++ achieves 1.21 × performance and reduces deployment costs by 51.4% compared to local storage configuration.