Results 1 - 10
of
12
AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service
"... Dynamically adjusting the number of virtual machines (VMs) assigned to a cloud application to keep up with load changes and interference from other uses typically requires detailed application knowledge and an ability to know the future, neither of which are readily available to infrastructure servi ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
Dynamically adjusting the number of virtual machines (VMs) assigned to a cloud application to keep up with load changes and interference from other uses typically requires detailed application knowledge and an ability to know the future, neither of which are readily available to infrastructure service providers or application owners. The result is that systems need to be over-provisioned (costly), or risk missing their performance Service Level Objectives (SLOs) and have to pay penalties (also costly). AGILE deals with both issues: it uses wavelets to provide a medium-term resource demand prediction with enough lead time to start up new application server instances before performance falls short, and it uses dynamic VM cloning to reduce application startup times. Tests using RUBiS and Google cluster traces show that AGILE can predict varying resource demands over the medium-term with up to 3.42 × better true positive rate and 0.34 × the false positive rate than existing schemes. Given a target SLO violation rate, AGILE can efficiently handle dynamic application workloads, reducing both penalties and user dissatisfaction. 1
Lowering the Barriers to Large-Scale Mobile Crowdsensing
"... Mobile crowdsensing is becoming a vital technique for environment monitoring, infrastructure management, and social computing. However, deploying mobile crowdsensing applications in large-scale environments is not a trivial task. It creates a tremendous burden on application developers as well as mo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Mobile crowdsensing is becoming a vital technique for environment monitoring, infrastructure management, and social computing. However, deploying mobile crowdsensing applications in large-scale environments is not a trivial task. It creates a tremendous burden on application developers as well as mobile users. In this paper we try to reveal the barriers hampering the scale-up of mobile crowdsensing applications, and to offer our initial thoughts on the potential solutions to lowering the barriers. 1.
Optimizing VM Checkpointing for Restore Performance in VMware ESXi,”
- in Proceedings of the 2013 USENIX Conference on Annual Technical Conference, ser. USENIX ATC’13,
, 2013
"... Abstract Cloud providers are increasingly looking to use virtual machine checkpointing for new applications beyond fault tolerance. Existing checkpointing systems designed for fault tolerance only optimize for saving checkpointed state, so they cannot support these new applications, which require b ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract Cloud providers are increasingly looking to use virtual machine checkpointing for new applications beyond fault tolerance. Existing checkpointing systems designed for fault tolerance only optimize for saving checkpointed state, so they cannot support these new applications, which require better restore performance. Improving restore performance requires a predictive technique to reduce the number of disk accesses to bring in the VM's memory on restore. However, complex VM workloads can diverge at any time due to external inputs, background processes, and timing variation, so predicting which pages the VM will access on restore to reduce faults to disk is impossible. Instead, we focus on predicting which pages the VM will access together on restore to improve the efficiency of disk accesses. To reduce the number of faults to disk on restore, we group memory pages likely to be accessed together into locality blocks. On each fault, we can load a block of pages that are likely to be accessed with the faulting page, eliminating future faults and increasing disk efficiency. We implement support for locality blocks, along with several other optimizations, in a new checkpointing system for VMware ESXi Server called Halite. Our experiments show that Halite reduces restore overhead by up to 94% for a range of workloads. Overview The ability to checkpoint and restore the state of a running virtual machine has been crucial for fault tolerance of virtualized workloads. Recently, cloud providers have been exploring new applications for VM checkpointing. For example, they want to use checkpointing to save and power off idle VMs to conserve energy. Restoring a checkpointed "template" VM could be used to clone new VMs on demand, which would enable fast, dynamic allocation of VMs for stateless workloads. Unlike traditional fault tolerance applications, these new applications depend on efficient restore of checkpointed VMs. For example, using checkpointing for dynamic allocation of VMs depends on the ability to quickly * Work done while all authors were at VMware. start up a VM on demand. Checkpointing systems designed to support fault tolerance only restore on failures, so they optimize for checkpoint save performance instead. As a result, previous work rarely addresses restore beyond basic support, so existing systems would offer poor performance for these new applications. Virtual machine checkpointing takes a snapshot of the state of a VM at a single point in time. The hypervisor writes any temporary VM state, like VM memory, to persistent storage and then reads it back into memory when restoring the checkpoint. Since memory images can be large, VMware ESXi uses a technique called lazy restore that loads the memory image from disk while the VM runs. While the VM's memory is partially on disk, any access to on-disk pages causes a fault that requires a disk synchronous access before the VM's execution can resume. Pauses in execution for faults to disk can quickly degrade the usability of the VM. Improving lazy restore performance requires a predictive technique that reduces the number of faults to pages on disk. However, it is impossible to predict which pages the VM will access on restore; the VM's execution might diverge at any time due to timing differences or external inputs, particularly with complex workloads that have many background tasks and user applications. Previous work Rather than reducing the number of faults to disk by predicting the pages that the VM will access on restore, we instead predict the pages that the VM will access together on restore. On each fault to disk during lazy restore, we prefetch a few pages that are likely to be accessed with the faulting page, rather than prefetching before the VM's execution begins. This technique is more resilient to divergence since the prefetching decision is based directly on pages that have been accessed by the VM after the restore. There is a smaller penalty for incorrect predictions because only a few pages are prefetched at a time. To allow for efficient prefetching on restore, we sort pages likely to be accessed together into locality blocks in the VM's checkpointed memory image. On restore,
vTube: Efficient Streaming of Virtual Appliances Over Last-Mile Networks
"... Cloud-sourced virtual appliances (VAs) have been touted as powerful solutions for many software maintenance, mobility, backward compatibility, and security challenges. In this paper, we ask whether it is possible to create a VA cloud service that supports fluid, interactive user experience even over ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Cloud-sourced virtual appliances (VAs) have been touted as powerful solutions for many software maintenance, mobility, backward compatibility, and security challenges. In this paper, we ask whether it is possible to create a VA cloud service that supports fluid, interactive user experience even over mobile networks. More specifically, we wish to support a YouTube-like streaming service for executable content, such as games, interactive books, research artifacts, etc. Users should be able to post, browse through, and interact with executable content swiftly and without long interruptions. Intuitively, this seems impossible; the bandwidths, latencies, and costs of last-mile networks would be prohibitive given the sheer sizes of virtual machines! Yet, we show that a set of carefully crafted, novel prefetching and streaming techniques can bring this goal surprisingly close to reality. We show that vTube, a VA streaming system that incorporates our techniques, supports fluid interaction even in challenging network conditions, such as 4G LTE. 1
Inception: Towards a Nested Cloud Architecture
"... Despite the increasing popularity of Infrastructure-as-aservice (IaaS) clouds, providers have been very slow in adopting a large number of innovative technologies, such as live VM migration, dynamic resource management, and VM replication. In this paper, we argue that the reasons are not only techni ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Despite the increasing popularity of Infrastructure-as-aservice (IaaS) clouds, providers have been very slow in adopting a large number of innovative technologies, such as live VM migration, dynamic resource management, and VM replication. In this paper, we argue that the reasons are not only technical but also fundamental, due to lack of transparency and conflict of interest between providers and customers. We present our vision inception, a nested IaaS cloud architecture to overcome this impasse. Inception clouds are built entirely on top of the resources acquired from today’s clouds, and provide nested VMs to end users. We discuss the benefits, use cases, and challenges of inception clouds, and present our network design and prototype implementation. 1
FlurryDB: A Dynamically Scalable Relational Database with Virtual Machine Cloning
, 2011
"... database with virtual machine cloning ..."
(Show Context)
SOFTScale: Stealing Opportunistically for Transient Scaling
"... Abstract. Dynamic capacity provisioning is a well studied approach to handling gradual changes in data center load. However, abrupt spikes in load are still problematic in that the work in the system rises very quickly during the setup time needed to turn on additional capacity. Performance can be s ..."
Abstract
- Add to MetaCart
Abstract. Dynamic capacity provisioning is a well studied approach to handling gradual changes in data center load. However, abrupt spikes in load are still problematic in that the work in the system rises very quickly during the setup time needed to turn on additional capacity. Performance can be severely affected even if it takes only 5 seconds to bring additional capacity online. In this paper, we propose SOFTScale, an approach to handling load spikes in multi-tier data centers without having to over-provision resources. SOFTScale works by opportunistically stealing resources from other tiers to alleviate the bottleneck tier, even when the tiers are carefully provisioned at capacity. SOFTScale is especially useful during the transient overload periods when additional capacity is being brought online. Via implementation on a 28-server multi-tier testbed, we investigate a range of possible load spikes, including an artificial doubling or tripling of load, as well as large spikes in real traces. We find that SOFTScale can meet our stringent 95th percentile response time Service Level Agreement goal of 500ms without using any additional resources even under some extreme load spikes that would normally cause the system (without SOFTScale) to exhibit response times as high as 96 seconds. 1
Just-in-Time Provisioning for Cyber Foraging
"... Cloud offload is an important technique in mobile computing. VMbased cloudlets have been proposed as offload sites for the resourceintensive and latency-sensitive computations typically associated with mobile multimedia applications. Since cloud offload relies on precisely-configured back-end softwa ..."
Abstract
- Add to MetaCart
(Show Context)
Cloud offload is an important technique in mobile computing. VMbased cloudlets have been proposed as offload sites for the resourceintensive and latency-sensitive computations typically associated with mobile multimedia applications. Since cloud offload relies on precisely-configured back-end software, it is difficult to support at global scale across cloudlets in multiple domains. To address this problem, we describe just-in-time (JIT) provisioning of cloudlets under the control of an associated mobile device. Using a suite of five representative mobile applications, we demonstrate a prototype system that is capable of provisioning a cloudlet with a non-trivial VM image in 10 seconds. This speed is achieved through dynamic VM synthesis and a series of optimizations to aggressively reduce transfer costs and startup latency.
Prebaked µVMs: Scalable, Instant VM Startup for IaaS Clouds
"... Abstract-IaaS clouds promise instantaneously available resources to elastic applications. In practice, however, virtual machine (VM) startup times are in the order of several minutes, or at best, several tens of seconds, negatively impacting the elasticity of applications like Web servers that need ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-IaaS clouds promise instantaneously available resources to elastic applications. In practice, however, virtual machine (VM) startup times are in the order of several minutes, or at best, several tens of seconds, negatively impacting the elasticity of applications like Web servers that need to scale out to handle dynamically increasing load. VM startup time is strongly influenced by booting the VM's operating system. In this work, we propose using so-called prebaked µVMs to speed up VM startup. µVMs are snapshots of minimal VMs that can be quickly resumed and then configured to application needs by hot-plugging resources. To serve µVMs, we extend our VM boot cache service, Squirrel, allowing to store µVMs for large numbers of VM images on the hosts of a data center. Our experiments show that µVMs can start up in less than one second on a standard file system. Using 1000+ VM images from a production cloud, we show that the respective µVMs can be stored in a compressed and deduplicated file system within 50 GB storage per host, while starting up within 2-3 seconds on average.
Revenue Driven Resource Allocation for Virtualized Data Centers
"... Abstract—The increasing VM density in cloud hosting services makes careful management of physical resources such as CPU, memory, and I/O bandwidth within individual virtualized servers a priority. To maximize cost-efficiency, resource management needs to be coupled with the revenue generating mechan ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—The increasing VM density in cloud hosting services makes careful management of physical resources such as CPU, memory, and I/O bandwidth within individual virtualized servers a priority. To maximize cost-efficiency, resource management needs to be coupled with the revenue generating mechanisms of cloud hosting: the service level agreements (SLAs) of hosted client applications. In this paper, we develop a server resource man-agement framework that reduces data center resource manage-ment complexity substantially. Our solution implements revenue-driven dynamic resource allocation which continuously steers the resource distribution across hosted VMs within a server such as to maximize the SLA-generated revenue from the server. Our experimental evaluation for a VMware ESX hypervisor highlights the importance of both resource isolation and resource sharing across VMs. The empirical data shows a 7%-54 % increase in total revenue generated for a mix of 10-25 VMs hosting either similar or diverse workloads when compared to using the currently available resource distribution mechanisms in ESX. I.