Results 1 - 10
of
12
I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance
"... Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication cons ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads showed an overall improvement in disk I/O performance of 28 to 47 % across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement.
SSD Bufferpool Extensions for Database Systems
"... High-end solid state disks (SSDs) provide much faster access to data compared to conventional hard disk drives. We present a technique for using solid-state storage as a caching layer between RAM andhard disks in database management systems. By caching data that is accessed frequently, disk I/O is r ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
High-end solid state disks (SSDs) provide much faster access to data compared to conventional hard disk drives. We present a technique for using solid-state storage as a caching layer between RAM andhard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced. For random I/O, the potential performance gains are particularly significant. Our system continuously monitors the disk access patterns to identify hot regions of the disk. Temperature statistics are maintained at the granularity of an extent, i.e., 32 pages, and are kept current through anaging mechanism. Unlikeprior caching methods, once the SSD is populated with pages from warm regions cold pages are not admitted into the cache, leading to low levels of cache pollution. Simulations based on DB2 I/O traces, and a prototype implementation within DB2 both show substantial performance improvements. 1.
AWOL: An Adaptive Write Optimizations Layer
"... Operating system memory managers fail to consider the population of read versus write pages in the buffer pool or outstanding I/O requests when writing dirty pages to disk or network file systems. This leads to bursty I/O patterns, which stall processes reading data and reduce the efficiency of stor ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Operating system memory managers fail to consider the population of read versus write pages in the buffer pool or outstanding I/O requests when writing dirty pages to disk or network file systems. This leads to bursty I/O patterns, which stall processes reading data and reduce the efficiency of storage. We address these limitations by adaptively allocating memory between write buffering and read caching and by writing dirty pages to disk opportunistically before the operating system submits them for write-back. We implement and evaluate our methods within the Linux R ○ operating system and show performance gains of more than 30% for mixed read/write workloads. 1
ABSTRACT Dynamic Partitioning of the Cache Hierarchy in Shared Data Centers
"... Due to the imperative need to reduce the management costs of large data centers, operators multiplex several concurrent database applications on a server farm connected to shared network attached storage. Determining and enforcing perapplication resource quotas in the resulting cache hierarchy, on t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Due to the imperative need to reduce the management costs of large data centers, operators multiplex several concurrent database applications on a server farm connected to shared network attached storage. Determining and enforcing perapplication resource quotas in the resulting cache hierarchy, on the fly, poses a complex resource allocation problem spanning the database server and the storage server tiers. This problem is further complicated by the need to provide strict Quality of Service (QoS) guarantees to hosted applications. In this paper, we design and implement a novel coordinated partitioning technique of the database buffer pool and storage cache between applications for any given cache replacement policy and per-application access pattern. We use statistical regression to dynamically determine the mapping between cache quota settings and the resulting perapplication QoS. A resource controller embedded within the database engine actuates the partitioning of the two-level cache, converging towards the configuration with maximum application utility, expressed as the service provider revenue in that configuration, based on a set of latency sample points. Our experimental evaluation, using the MySQL database engine, a server farm with consolidated storage, and two ecommerce benchmarks, shows the effectiveness of our technique in enforcing application QoS, as well as maximizing the revenue of the service provider in shared server farms. 1.
Practical Techniques for Purging Deleted Data Using Liveness Information
"... The layered design of the Linux operating system hides the liveness of file system data from the underlying block layers. This lack of liveness information prevents the storage system from discarding blocks deleted by the file system, often resulting in poor utilization, security problems, inefficie ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The layered design of the Linux operating system hides the liveness of file system data from the underlying block layers. This lack of liveness information prevents the storage system from discarding blocks deleted by the file system, often resulting in poor utilization, security problems, inefficient caching, and migration overheads. In this paper, we define a generic “purge ” operation that can be used by a file system to pass liveness information to the block layer with minimal changes in the layer interfaces, allowing the storage system to discard deleted data. We present three approaches for implementing such a purge operation: direct call, zero blocks, and flagged writes, each of which differs in their architectural complexity and potential performance overhead. We evaluate the feasibility of these techniques through a reference implementation of a dynamically resizable copy on write (COW) data store in User Mode Linux (UML). Performance results obtained from this reference implementation show that all these techniques can achieve significant storage savings with a reasonable execution time overhead. At the same time, our results indicate that while the direct call approach has the best performance, the zero block approach provides the best compromise in terms of performance overhead and its semantic and architectural simplicity. Overall, our results demonstrate that passing liveness information across the file system-block layer interface with minimal changes is not only feasible but practical. 1.
Practical Techniques for Eliminating Storage of Deleted Data
"... The layered design of modern file systems hides the liveness of data from the underlying storage systems. In this paper, we define a generic “purge” operation that can be used by a file system to pass liveness information to the storage system with minimal changes in the layer interfaces. We present ..."
Abstract
- Add to MetaCart
The layered design of modern file systems hides the liveness of data from the underlying storage systems. In this paper, we define a generic “purge” operation that can be used by a file system to pass liveness information to the storage system with minimal changes in the layer interfaces. We present three approaches for implementing such a purge operation: direct call, zero pages, and flagged writes. We demonstrate the feasibility of these techniques through a reference implementation in User-mode Linux to dynamically manage a copy-onwrite (COW) data store. Performance results obtained from this reference implementation show that these techniques can achieve significant storage savings with a reasonable execution time overhead. Our results demonstrate that passing liveness information across the file system-block layer interface with minimal changes is not only feasible but practical.
Context-Aware Prefetching at the Storage Server
"... In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. ..."
Abstract
- Add to MetaCart
In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. However, the high level of concurrency in today’s applications typically leads to interleaved block accesses, which makes detecting an access pattern a very challenging problem. Towards this, we propose and evaluate QuickMine, a novel, lightweight and minimally intrusive method for contextaware prefetching. Under QuickMine, we capture application contexts, such as a transaction or query, and leverage them for context-aware prediction and improved prefetching effectiveness in the storage cache. We implement a prototype of our context-aware prefetching algorithm in a storage-area network (SAN) built using Network Block Device (NBD). Our prototype shows that context-aware prefetching clearly outperforms existing context-oblivious prefetching algorithms, resulting in factors of up to 2 improvements in application latency for two e-commerce workloads with repeatable access patterns, TPC-W and RUBiS. 1
CA-NFS: A Congestion-Aware Network File System
"... We develop a holistic framework for adaptively scheduling asynchronous requests in distributed file systems. The system is holistic in that it manages all resources, including network bandwidth, server I/O, server CPU, and client and server memory utilization. It accelerates, defers, or cancels asyn ..."
Abstract
- Add to MetaCart
We develop a holistic framework for adaptively scheduling asynchronous requests in distributed file systems. The system is holistic in that it manages all resources, including network bandwidth, server I/O, server CPU, and client and server memory utilization. It accelerates, defers, or cancels asynchronous requests in order to improve application-perceived performance directly. We employ congestion pricing via online auctions to coordinate the use of system resources by the file system clients so that they can detect shortages and adapt their resource usage. We implement our modifications in the Congestion-Aware Network File System (CA-NFS), an extension to the ubiquitous network file system (NFS). Our experimental result shows that CA-NFS results in a 20 % improvement in execution times when compared with NFS for a variety of workloads. 1
Quantifying Temporal and Spatial Localities in Storage Workloads and Transformations by Data Path Components
"... Temporal and spatial localities are basic concepts in operating systems, and storage systems rely on localities to perform well. Surprisingly, it is difficult to quantify the localities present in workloads and how localities are transformed by storage data path components in metrics that can be com ..."
Abstract
- Add to MetaCart
Temporal and spatial localities are basic concepts in operating systems, and storage systems rely on localities to perform well. Surprisingly, it is difficult to quantify the localities present in workloads and how localities are transformed by storage data path components in metrics that can be compared under diverse settings. In this paper, we introduce stack- and block-affinity metrics to quantify temporal and spatial localities. We demonstrate that our metrics (1) behave well under extreme and normal loads, (2) can be used to validate synthetic loads at each stage of storage optimization, (3) can capture localities in ways that are resilient to generations of hardware, and (4) correlate meaningfully with performance. Our experience also unveiled hidden semantics of localities and identified future research directions. 1.
SOPA: Selecting the Optimal Policy Adaptively for a cache system
"... With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal ..."
Abstract
- Add to MetaCart
With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal policy and perform policy switching. First, SOPA encapsulates the functions of a caching policy into a module, and enables online policy switching by policy reconstruction. Second, SOPA selects the optimal policy dynamically by collecting and analyzing access traces, and reduces the decision-making cost by asynchronous decision. The simulation evaluation showed that no single caching policy could perform well under all of the different workloads, but SOPA could select the appropriate policy for each workload. The real-system evaluation showed that SOPA could reduce the average response time by up to 20.3 % compared with LRU and up to 11.9% compared with ARC.

