Results 1 - 10
of
61
Energy Conservation Techniques for Disk Array-Based Servers
, 2004
"... In this paper, we study energy conservation techniques for disk array-based network servers. First, we introduce a new conservation technique, called Popular Data Concentration (PDC), that migrates frequently accessed data to a subset of the disks. The goal is to skew the load towards a few of the d ..."
Abstract
-
Cited by 88 (7 self)
- Add to MetaCart
In this paper, we study energy conservation techniques for disk array-based network servers. First, we introduce a new conservation technique, called Popular Data Concentration (PDC), that migrates frequently accessed data to a subset of the disks. The goal is to skew the load towards a few of the disks, so that others can be transitioned to low-power modes. Next, we introduce a user-level file server that takes advantage of PDC. In the context of this server, we compare PDC to the Massive Array of Idle Disks (MAID). Using a validated simulator, we evaluate these techniques for conventional and two-speed disks and a wide range of parameters. Our results for conventional disks show that PDC and MAID can only conserve energy when the load on the server is extremely low. When two-speed disks are used, both PDC and MAID can conserve significant energy with only a small fraction of delayed requests. Overall, we find that PDC achieves more consistent and robust energy savings than MAID.
My cache or yours? Making storage more exclusive
- In Proceedings of the 2002 USENIX Annual Technical Conference
, 2002
"... Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive. We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful -- sometimes substantial -- speedups. During our investigations, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.
Semantically-Smart Disk Systems
, 2003
"... We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits th ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits this knowledge to transparently improve performance or enhance functionality beneath a standard block read/write interface. To automatically acquire this knowledge, we introduce a tool (EOF) that can discover file-system structure for certain types of file systems, and then show how an SDS can exploit this knowledge on-line to understand file-system behavior. We quantify the space and time overheads that are common in an SDS, showing that they are not excessive. We then study the issues surrounding SDS construction by designing and implementing a number of prototypes as case studies; each case study exploits knowledge of some aspect of the file system to implement powerful functionality beneath the standard SCSI interface. Overall, we find that a surprising amount of functionality can be embedded within an SDS, hinting at a future where disk manufacturers can compete on enhanced functionality and not simply cost-per-byte and performance.
Object-based Storage
- In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11), SanJose,CA,Feb 15-17 2011. The USENIX Association
"... We propose an I/O classification architecture to close the widening semantic gap between computer systems and storage systems. By classifying I/O, a computer system can request that different classes of data be handled with different storage system policies. Specifically, when a storage system is fi ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
We propose an I/O classification architecture to close the widening semantic gap between computer systems and storage systems. By classifying I/O, a computer system can request that different classes of data be handled with different storage system policies. Specifically, when a storage system is first initialized, we assign performance policies to predefined classes, such as the filesystem journal. Then, online, we include a classifier with each I/O command (e.g., SCSI), thereby allowing the storage system to enforce the associated policy for each I/O that it receives. Our immediate application is caching. We present filesystem prototypes and a database proof-of-concept that classify all disk I/O — with very little modification to the filesystem, database, and operating system. We associate caching policies with various classes (e.g., large files shall be evicted before metadata and small files), and we show that end-to-end file system performance can be improved by over a factor of two, relative to conventional caches like LRU. And caching is simply one of many possible applications. As part of our ongoing work, we are exploring other classes, policies and storage system mechanisms that can be used to improve end-to-end performance, reliability and security.
Predicting Whole-Program Locality Through Reuse Distance Analysis
, 2003
"... Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in short-distance reuses or local control flow. However, the analysis must meet two requirements to be useful. The first is efficiency. It needs to analyze all accesses to all data elements in full-size benchmarks and to measure distance of any length and in any required precision. The second is predication. Based on a few training runs, it needs to classify patterns as regular and irregular and, for regular ones, it should predict their (changing) behavior for other inputs. In this paper, we show that these goals are attainable through three techniques: approximate analysis of reuse distance (originally called LRU stack distance), pattern recognition, and distance-based sampling. When tested on 15 integer and floating-point programs from SPEC and other benchmark suites, our techniques predict with on average 94% accuracy for data inputs up to hundreds times larger than the training inputs. Based on these results, the paper discusses possible uses of this analysis.
Experiences with VI Communication for Database Storage
- In Proceedings of the 29th annual international symposium on Computer architecture
, 2002
"... This paper examines how VI–based interconnects can be used to improve I/O path performance between a database server and the storage subsystem. We design and implement a software layer, DSA, that is layered between the application and VI. DSA takes advantage of specific VI features and deals with ma ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
This paper examines how VI–based interconnects can be used to improve I/O path performance between a database server and the storage subsystem. We design and implement a software layer, DSA, that is layered between the application and VI. DSA takes advantage of specific VI features and deals with many of its shortcomings. We provide and evaluate one kernel–level and two user–level implementations of DSA. These implementations trade transparency and generality for performance at different degrees, and unlike research prototypes are designed to be suitable for real– world deployment. We present detailed measurements using a commercial database management system with both micro-benchmarks and industrial database workloads on a mid–size, 4 CPU, and a large, 32 CPU, database server. Our results show that VI–based interconnects and user– level communication can improve all aspects of the I/O path between the database system and the storage back-end. We also find that to make effective use of VI in I/O intensive environments we need to provide substantial additional functionality than what is currently provided by VI. Finally, new storage APIs that help minimize kernel involvement in the I/O path are needed to fully exploit the benefits of VI–based communication. 1
CAR: Clock with Adaptive Replacement
- IN PROCEEDINGS OF THE USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST
, 2004
"... CLOCK is a classical cache replacement policy dating back to 1968 that was proposed as a low-complexity approximation to LRU. On every cache hit, the policy LRU needs to move the accessed item to the most recently used position, at which point, to ensure consistency and correctness, it serializes c ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
CLOCK is a classical cache replacement policy dating back to 1968 that was proposed as a low-complexity approximation to LRU. On every cache hit, the policy LRU needs to move the accessed item to the most recently used position, at which point, to ensure consistency and correctness, it serializes cache hits behind a single global lock. CLOCK eliminates this lock contention, and, hence, can support high concurrency and high throughput environments such as virtual memory (for example, Multics, UNIX, BSD, AIX) and databases (for example, DB2). Unfortunately, CLOCK is still plagued by disadvantages of LRU such as disregard for "frequency", susceptibility to scans, and low performance.
As our main contribution, we propose a simple and elegant new algorithm, namely, CLOCK with Adaptive Replacement (CAR), that has several advantages over CLOCK: (i) it is scan-resistant; (ii) it is self-tuning and it adaptively and dynamically captures the "recency" and "frequency" features of a workload; (iii) it uses essentially the same primitives as CLOCK, and, hence, is low-complexity and amenable to a high-concurrency implementation; and (iv) it outperforms CLOCK across a wide-range of cache sizes and workloads. The algorithm CAR is inspired by the Adaptive Replacement Cache (ARC) algorithm, and inherits virtually all advantages of ARC including its high performance, but does not serialize cache hits behind a single global lock. As our second contribution, we introduce another novel algorithm, namely, CAR with Temporal filtering (CART), that has all the advantages of CAR, but, in addition, uses a certain temporal filter to distill pages with long-term utility from those with only short-term utility.
C-Miner: Mining Block Correlations in Storage Systems
- In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST ’04
, 2004
"... systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in fi ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough to be used for discovering block correlations in storage systems.
Performance of Compressed Inverted List Caching in Search Engines
- In WWW
, 2008
"... Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy th ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.
PB-LRU: A Self-Tuning Power Aware Storage Cache Replacement Algorithm for Conserving Disk Energy
- In Proceedings of the 18th International Conference on Supercomputing
, 2004
"... Energy consumption is an important concern at data centers, where storage systems consume a significant fraction of the total energy. A recent study proposed power-aware storage cache management to provide more opportunities for the underlying disk power management scheme to save energy. However, th ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Energy consumption is an important concern at data centers, where storage systems consume a significant fraction of the total energy. A recent study proposed power-aware storage cache management to provide more opportunities for the underlying disk power management scheme to save energy. However, the on-line algorithm proposed in that study requires cumbersome parameter tuning for each workload and is therefore difficult to apply to real systems.

