Results 1 - 10
of
21
File Access Prediction with Adjustable Accuracy
, 2002
"... We describe a novel on-line file access predictor, Recent Popularity, capable of rapid adaptation to workload changes while simultaneously predicting more events with greater accuracy than prior efforts. We distinguish the goal of predicting the most events accurately from the goal of offering the ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
We describe a novel on-line file access predictor, Recent Popularity, capable of rapid adaptation to workload changes while simultaneously predicting more events with greater accuracy than prior efforts. We distinguish the goal of predicting the most events accurately from the goal of offering the most accurate predictions (when declining to offer a prediction is acceptable). For this purpose we present two distinct measures of accuracy, general and specific accuracy, corresponding to these goals. We describe how our new predictor and an earlier effort, Noah, can trade the number of events predicted for prediction accuracy by modifying simple parameters. When prediction accuracy is strictly more important than the number of predictions offered, trace-based evaluation demonstrates error rates as low as 2%, while offering predictions for more than 60% of all file access events.
DULO: An effective buffer cache management scheme to exploit both temporal and spatial localities
- In USENIX Conference on File and Storage Technologies (FAST
, 2005
"... Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of cached blocks is largely ignored and only temporal locality is considered in system buffer cache management. Thus, disk performance for workloads without dominant sequential accesses can be seriously degraded. To address this problem, we propose a scheme called DULO (DUal LOcality), which exploits both temporal and spatial locality in buffer cache management. Leveraging the filtering effect of the buffer cache, DULO can influence the I/O request stream by making the requests passed to disk more sequential, significantly increasing the effectiveness of I/O scheduling and prefetching for disk performance improvements. DULO has been extensively evaluated by both tracedriven simulations and a prototype implementation in Linux 2.6.11. In the simulations and system measurements, various application workloads have been tested, including Web Server, TPC benchmarks, and scientific programs. Our experiments show that DULO can significantly increase system throughput and reduce program execution times. 1
A nine year study of file system and storage benchmarking
- ACM Transactions on Storage
, 2008
"... Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2–5 % wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
DiskSeen: Exploiting Disk Layout and Access History to Enhance
- I/O Prefetch, in Proceedings of USENIX Annual Technical Conference 2007
, 2007
"... Current disk prefetch policies in major operating systems track access patterns at the level of the file abstraction. While this is useful for exploiting application-level access patterns, file-level prefetching cannot realize the full performance improvements achievable by prefetching. There are tw ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Current disk prefetch policies in major operating systems track access patterns at the level of the file abstraction. While this is useful for exploiting application-level access patterns, file-level prefetching cannot realize the full performance improvements achievable by prefetching. There are two reasons for this. First, certain prefetch opportunities can only be detected by knowing the data layout on disk, such as the contiguous layout of file metadata or data from multiple files. Second, non-sequential access of disk data (requiring disk head movement) is much slower than sequential access, and the penalty for mis-prefetching a ‘random ’ block, relative to that of a sequential block, is correspondingly more costly. To overcome the inherent limitations of prefetching at the logical file level, we propose to perform prefetching directly at the level of disk layout, and in a portable way. Our technique, called DiskSeen, is intended to be supplementary to, and to work synergistically with, filelevel prefetch policies, if present. DiskSeen tracks the locations and access times of disk blocks, and based on analysis of their temporal and spatial relationships, seeks to improve the sequentiality of disk accesses and overall prefetching performance. Our implementation of the DiskSeen scheme in the Linux 2.6 kernel shows that it can significantly improve the effectiveness of prefetching, reducing execution times by 20%-53 % for micro-benchmarks and real applications such as grep, CVS, and TPC-H. 1
Sarc: Sequential prefetching in adaptive replacement cache
- In Proc. of USENIX 2005 Annual Technical Conference (2005
, 2005
"... Abstract — Sequentiality of reference is an ubiquitous access pattern dating back at least to Multics. Sequential workloads lend themselves to highly accurate prediction and prefetching. In spite of the simplicity of the workload, design and analysis of a good sequential prefetching algorithm and as ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract — Sequentiality of reference is an ubiquitous access pattern dating back at least to Multics. Sequential workloads lend themselves to highly accurate prediction and prefetching. In spite of the simplicity of the workload, design and analysis of a good sequential prefetching algorithm and associated cache replacement policy turns out to be surprisingly intricate. As first contribution, we uncover and remedy an anomaly (akin to famous Belady’s anomaly) that plagues sequential prefetching when integrated with caching. Typical workloads contain a mix of sequential and random streams. As second contribution, we design a self-tuning, low overhead, simple to implement, locally adaptive, novel cache management policy SARC that dynamically and adaptively partitions the cache space amongst sequential and random streams so as to reduce the read misses. As third contribution, we implemented SARC along with two popular state-of-theart LRU variants on hardware for IBM’s flagship storage controller Shark. On Shark hardware with 8 GB cache and 16 RAID-5 arrays that is serving a workload akin to Storage Performance Council’s widely adopted SPC-1 benchmark, SARC consistently and dramatically outperforms the two LRU variants shifting the throughput-response time curve to the right and thus fundamentally increasing the capacity of the system. As anecdotal evidence, at the peak throughput, SARC has average response time of 5.18ms as compared to 33.35ms and 8.92ms for the two LRU variants. I.
Using Program and User Information to Improve File Prediction Performance
, 2001
"... Correct prediction of file accesses can improve system performance by mitigating the relative speed difference between CPU and disks. This paper discusses Program-based Last Successor (PLS) and presents Program- and Userbased Last Successor (PULS), file prediction algorithms that utilize information ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Correct prediction of file accesses can improve system performance by mitigating the relative speed difference between CPU and disks. This paper discusses Program-based Last Successor (PLS) and presents Program- and Userbased Last Successor (PULS), file prediction algorithms that utilize information about the program and user that access the files. Our simulation results show that PLS makes 21% fewer incorrect predictions and PULS makes 24% fewer incorrect predictions than last-successor with roughly the same number of correct predictions that lastsuccessor makes. The cache space wasted on incorrect predictions can be reduced accordingly. We also show that a cache using the Least Recently Used (LRU) caching algorithm can perform better when the PULS is applied. In some cases, a cache using LRU and either PLS or PULS performs better than a cache up to 40 times larger using LRU alone.
Using Multiple Predictors to Improve the Accuracy of File Access Predictions
- In Proceedings of the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies
, 2003
"... Existing file access predictors keep track of previous file access patterns and rely on a single heuristic to predict which of the previous successors to the file being currently accessed is the most likely to be accessed next. We present here a novel composite predictor that applies multiple heuris ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Existing file access predictors keep track of previous file access patterns and rely on a single heuristic to predict which of the previous successors to the file being currently accessed is the most likely to be accessed next. We present here a novel composite predictor that applies multiple heuristics to this selection problem. As a result, it can make use of specialized heuristics that can make very accurate predictions when access patterns are observed to meet their particular criteria. Simulation results involving a total of seven file access traces indicate that our predictor delivers more correct predictions and less inaccurate guesses than predictors relying on a single heuristic for selecting a successor. 1.
Expecting the unexpected: adaptation for predictive energy conservation
- In StorageSS ’05: Proceedings of the 2005 ACM workshop on Storage security and survivability
, 2005
"... The use of access predictors to improve storage device performance has been investigated for both improving access times, as well as a means of reducing energy consumed by the disk. Such predictors also offer us an opportunity to demonstrate the benefits of an adaptive approach to handling unexpecte ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The use of access predictors to improve storage device performance has been investigated for both improving access times, as well as a means of reducing energy consumed by the disk. Such predictors also offer us an opportunity to demonstrate the benefits of an adaptive approach to handling unexpected workloads, whether they are the result of natural variation or deliberate attempts to generate a problematic workload. Such workloads can pose a threat to system availability if they result in the excessive consumption of potentially limited resources such as energy. We propose that actively reshaping a disk access workload, using a dynamically self-adjusting access predictor, allows for consistently good performance in the face of varying workloads. Specifically, we describe how our Best Shifting prefetching policy, by adapting to the needs of the currently observed workload, can use 15 % to 35 % less energy than traditional disk spin-down strategies and 5 % to 10 % less energy than the use of a fixed prefetching policy.
A buffer cache management scheme exploiting both temporal and spatial localities
- Trans. Storage
"... On-disk sequentiality of requested blocks, or their spatial locality, is critical to real disk performance where the throughput of access to sequentially-placed disk blocks can be an order of magnitude higher than that of access to randomly-placed blocks. Unfortunately, spatial locality of cached bl ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
On-disk sequentiality of requested blocks, or their spatial locality, is critical to real disk performance where the throughput of access to sequentially-placed disk blocks can be an order of magnitude higher than that of access to randomly-placed blocks. Unfortunately, spatial locality of cached blocks is largely ignored, and only temporal locality is considered in current system buffer cache managements. Thus, disk performance for workloads without dominant sequential accesses can be seriously degraded. To address this problem, we propose a scheme called DULO (DUal LOcality) which exploits both temporal and spatial localities in the buffer cache management. Leveraging the filtering effect of the buffer cache, DULO can influence the I/O request stream by making the requests passed to the disk more sequential, thus significantly increasing the effectiveness of I/O scheduling and prefetching for disk performance improvements. We have implemented a prototype of DULO in Linux 2.6.11. The implementation shows that DULO can significantly increases disk I/O throughput for real-world applications such as a Web server, TPC benchmark, file system benchmark, and scientific programs. It reduces their execution times by as much as 53%.
A Stochastic Approach to File Access Prediction
"... Most existing studies of file access prediction are experimental in nature and rely on trace driven simulation to predict the performance of the schemes being investigated. We present a first order Markov analysis of file access prediction, discuss its limitations and show how it can be used to es ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Most existing studies of file access prediction are experimental in nature and rely on trace driven simulation to predict the performance of the schemes being investigated. We present a first order Markov analysis of file access prediction, discuss its limitations and show how it can be used to estimate the performance of file access predictors, such as First Successor, Last Successor, Stable Successor and Best-k-out-of-n. We compare these analytical results with experimental measurements performed on several file traces and find out that specific workloads, and indeed individual files, can exhibit very different levels of nonstationarity.

