Results 1 - 10
of
64
Semantically-Smart Disk Systems
, 2003
"... We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits th ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits this knowledge to transparently improve performance or enhance functionality beneath a standard block read/write interface. To automatically acquire this knowledge, we introduce a tool (EOF) that can discover file-system structure for certain types of file systems, and then show how an SDS can exploit this knowledge on-line to understand file-system behavior. We quantify the space and time overheads that are common in an SDS, showing that they are not excessive. We then study the issues surrounding SDS construction by designing and implementing a number of prototypes as case studies; each case study exploits knowledge of some aspect of the file system to implement powerful functionality beneath the standard SCSI interface. Overall, we find that a surprising amount of functionality can be embedded within an SDS, hinting at a future where disk manufacturers can compete on enhanced functionality and not simply cost-per-byte and performance.
ACME: Adaptive Caching Using Multiple Experts
- IN PROCEEDINGS IN INFORMATICS
, 2002
"... The gap between CPU speeds and the speed of the technologies providing the data is increasing. As a result, latency and bandwidth to needed data is limited by the performance of the storage devices and the networks that connect them to the CPU. Distributed caching techniques are often used to re ..."
Abstract
-
Cited by 41 (19 self)
- Add to MetaCart
The gap between CPU speeds and the speed of the technologies providing the data is increasing. As a result, latency and bandwidth to needed data is limited by the performance of the storage devices and the networks that connect them to the CPU. Distributed caching techniques are often used to reduce the penalties associated with such caching; however, such techniques need further development to be truly integrated into the network. This paper describes the preliminary design of an adaptive caching scheme using multiple experts, called ACME. ACME is used to manage the replacement policies within distributed caches to further improve the hit rates over static caching techniques. We propose the use of machine learning algorithms to rate and select the current best policies or mixtures of policies via weight updates based on their recent success, allowing each adaptive cache node to tune itself based on the workload it observes. Since no cache databases or synchronization messages are exchanged for adaptivity, the clusters composed of these nodes will be scalable and manageable. We show that static techniques are suboptimal when combined in networks of caches, providing potential for adaptivity to improve performance.
C-Miner: Mining Block Correlations in Storage Systems
- In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST ’04
, 2004
"... systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in fi ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough to be used for discovering block correlations in storage systems.
Measurement and analysis of large-scale network file system workloads
- In Proceedings of the 2008 USENIX Annual Technical Conference
, 2008
"... In this paper we present the analysis of two large-scale network file system workloads. We measured CIFS traffic for two enterprise-class file servers deployed in the NetApp data center for a three month period. One file server was used by marketing, sales, and finance departments and the other by t ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
In this paper we present the analysis of two large-scale network file system workloads. We measured CIFS traffic for two enterprise-class file servers deployed in the NetApp data center for a three month period. One file server was used by marketing, sales, and finance departments and the other by the engineering department. Together these systems represent over 22 TB of storage used by over 1500 employees, making this the first ever large-scale study of the CIFS protocol. We analyzed how our network file system workloads compared to those of previous file system trace studies and took an in-depth look at access, usage, and sharing patterns. We found that our workloads were quite different from those previously studied; for example, our analysis found increased read-write file access patterns, decreased read-write ratios, more random file access, and longer file lifetimes. In addition, we found a number of interesting properties regarding file sharing, file re-use, and the access patterns of file types and users, showing that modern file system workload has changed in the past 5–10 years. This change in workload characteristics has implications on the future design of network file systems, which we describe in the paper. 1
Issues and Challenges in the Performance Analysis of Real Disk Arrays
- IEEE Transactions on Parallel and Distributed Systems
, 2004
"... The performance modeling and analysis of disk arrays is challenging due to the presence of multiple disks, large array caches, and sophisticated array controllers. Moreover, storage manufacturers may not reveal the internal algorithms implemented in their devices, so real disk arrays are effective ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
The performance modeling and analysis of disk arrays is challenging due to the presence of multiple disks, large array caches, and sophisticated array controllers. Moreover, storage manufacturers may not reveal the internal algorithms implemented in their devices, so real disk arrays are effectively black-boxes. We use standard performance techniques to develop an integrated performance model that incorporates some of the complexities of real disk arrays. We show how measurement data and baseline performance models can be used to extract information about the various features implemented in a disk array. In this process, we identify areas for future research in the performance analysis of real disk arrays.
PB-LRU: A Self-Tuning Power Aware Storage Cache Replacement Algorithm for Conserving Disk Energy
- In Proceedings of the 18th International Conference on Supercomputing
, 2004
"... Energy consumption is an important concern at data centers, where storage systems consume a significant fraction of the total energy. A recent study proposed power-aware storage cache management to provide more opportunities for the underlying disk power management scheme to save energy. However, th ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Energy consumption is an important concern at data centers, where storage systems consume a significant fraction of the total energy. A recent study proposed power-aware storage cache management to provide more opportunities for the underlying disk power management scheme to save energy. However, the on-line algorithm proposed in that study requires cumbersome parameter tuning for each workload and is therefore difficult to apply to real systems.
Empirical Evaluation of Multi-level Buffer Cache Collaboration for Storage Systems
- In Proceedings of the 2005 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’05
, 2005
"... To bridge the increasing processor-disk performance gap, buffer caches are used in both storage clients (e.g. database systems) and storage servers to reduce the number of slow disk accesses. These buffer caches need to be managed effectively to deliver the performance commensurate to the aggregate ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
To bridge the increasing processor-disk performance gap, buffer caches are used in both storage clients (e.g. database systems) and storage servers to reduce the number of slow disk accesses. These buffer caches need to be managed effectively to deliver the performance commensurate to the aggregate buffer cache size. To address this problem, two paradigms have been proposed recently to collaboratively manage these buffer caches together: the hierarchy-aware caching maintains the same I/O interface and is fully transparent to the storage client software, and the aggressively-collaborative caching trades off transparency for performance and requires changes to both the interface and the storage client software. Before storage industry starts to implement collaborative caching in real systems, it is crucial to find out whether sacrificing transparency is really worthwhile, i.e., how much can we gain by using
Second-Level Buffer Cache Management
- IEEE Transactions on Parallel and Distributed Systems
, 2004
"... Abstract—Buffer caches are commonly used in servers to reduce the number of slow disk accesses or network messages. These buffer caches form a multilevel buffer cache hierarchy. In such a hierarchy, second-level buffer caches have different access patterns from first-level buffer caches because acce ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Abstract—Buffer caches are commonly used in servers to reduce the number of slow disk accesses or network messages. These buffer caches form a multilevel buffer cache hierarchy. In such a hierarchy, second-level buffer caches have different access patterns from first-level buffer caches because accesses to a second-level are actually misses from a first-level. Therefore, commonly used cache management algorithms such as the Least Recently Used (LRU) replacement algorithm that work well for single-level buffer caches may not work well for second-level. This paper investigates multiple approaches to effectively manage second-level buffer caches. In particular, it reports our research results in 1) second-level buffer cache access pattern characterization, 2) a new local algorithm called Multi-Queue (MQ) that performs better than nine tested alternative algorithms for second-level buffer caches, 3) a set of global algorithms that manage a multilevel buffer cache hierarchy globally and significantly improve second-level buffer cache hit ratios over corresponding local algorithms, and 4) implementation and evaluation of these algorithms in a real storage system connected with commercial database servers (Microsoft SQL Server and Oracle) running industrial-strength online transaction processing benchmarks. Index Terms—Cache memories, storage hierarchy, storage management. 1
Database-Aware Semantically-Smart Storage
- In Proceedings of the 4th USENIX Conference on File and Storage Technologies. USENIX Association
, 2005
"... Recent research has demonstrated the potential benefits of building storage arrays that understand the file systems above them. Such “semantically-smart ” disk systems use knowledge of file system structures and operations to improve performance, availability, and even security in ways that are prec ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Recent research has demonstrated the potential benefits of building storage arrays that understand the file systems above them. Such “semantically-smart ” disk systems use knowledge of file system structures and operations to improve performance, availability, and even security in ways that are precluded in a traditional storage system architecture. In this paper, we study the applicability of semantically smart disk technology underneath database management systems. For three case studies, we analyze the differences when building database-aware storage. We find that semantically-smart disk systems can be successfully applied underneath a database, but that new techniques, such as log snooping and explicit access statistics, are needed. 1
Power-aware storage cache management
- IEEE Transactions on Computers
, 2005
"... Reducing energy consumption is an important issue for data centers. Among the various components of a data center, storage is one of the biggest energy consumers. Previous studies have shown that the average idle period for a server disk in a data center is very small compared to the time taken to s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Reducing energy consumption is an important issue for data centers. Among the various components of a data center, storage is one of the biggest energy consumers. Previous studies have shown that the average idle period for a server disk in a data center is very small compared to the time taken to spin down and spin up. This significantly limits the effectiveness of disk power management schemes. This article proposes several power-aware storage cache management algorithms that provide more opportunities for the underlying disk power management schemes to save energy. More specifically, we present an off-line energy-optimal cache replacement algorithm using dynamic programming which minimizes the disk energy consumption. We also present an off-line power-aware greedy algorithm that is more energy-efficient than Belady’s off-line algorithm (which minimizes cache misses only). We also propose two online power-aware algorithms, PA-LRU and PB-LRU. Simulation results with both real system and synthetic workloads show that, compared to LRU, our online algorithms can save up to 22% more disk energy and provide up to 64 % better average response time. We have also investigated the effects of four storage cache write policies on disk energy consumption.

