Results 1 -
8 of
8
A nine year study of file system and storage benchmarking
- ACM Transactions on Storage
, 2008
"... Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2–5 % wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
Why can't I find my files? New methods for automating attribute assignment
- PROCEEDINGS OF THE NINTH WORKSHOP ON HOT TOPICS IN OPERATING SYSTEMS
, 2003
"... Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper discusses new approaches to automatically assigning attributes to files, including several forms of context analysis, which has been highly successful in the Google web search engine. With extensions like application hints (e.g., web links for downloaded files) and inter-file relationships, it should be possible to infer useful attributes for many files, making attribute-based search tools more effective.
Toward Automatic Context-Based Attribute Assignment for Semantic File Systems
, 2004
"... Semantic file systems enable users to search for files based on attributes rather than just pre-assigned names. This paper develops and evaluates several new approaches to automatically generating file attributes based on context, complementing existing approaches based on content analysis. Context ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Semantic file systems enable users to search for files based on attributes rather than just pre-assigned names. This paper develops and evaluates several new approaches to automatically generating file attributes based on context, complementing existing approaches based on content analysis. Context captures broader system state that can be used to provide new attributes for files, and to propagate attributes among related files; context is also how humans often remember previous items [2], and so should fit the primary role of semantic file systems well. Based on our study of ten systems over four months, the addition of context-based mechanisms, on average, reduces the number of files with zero attributes by 73%. This increases the total number of classifiable files by over 25% in most cases, as is shown in Figure 1. Also, on average, 71% of the content-analyzable files also gain additional valuable attributes.
Dealing with Massive Data: From Parallel I/O to Grid I/O
, 2003
"... Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also acknowledge Heinz and Kurt Stockinger who provided constructive comments. We would also like to thank everybody for providing us with feedback.
Performance Evaluation of Parallel I/O in Cluster Environments
"... Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously ..."
Abstract
- Add to MetaCart
Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously investigated. These issues include: parallel file systems, access patterns, low-level I/0 interface, scientific data libraries, and data management. In this paper, we address the bottleneck and performance factors' of parallel I/0 in a cluster environment. Our experiment shows that network is one of the potential bottlenecks' in cluster-based parallel I/0. Furthermore, the performance of the distributed RAID5, which is built on the network block device (NBD) installed on the clusters' in our department, is evaluated and compared with single disk I/0. The experiment results confirm that, in most situations, the performance of distributed RAID is noticeably better than that of single disk system. Lastly, the experiment results' indicate that file size and block size have significant impact on the performance of both single disk system and distributed RAID on clusters'.
nd USENIX Conference on File and Storage Technologies (FAST03). San Francisco, CA, March 31-Apr 2, 2003.
"... Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper ..."
Abstract
- Add to MetaCart
Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper discusses new approaches to automatically assigning attributes to files, including several forms of context analysis, which has been highly successful in the Google web search engine. With extensions like application hints (e.g., web links for downloaded files) and inter-file relationships, it should be possible to infer useful attributes for many files, making attribute-based search tools more effective.
Why can't I find my files?
"... Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper ..."
Abstract
- Add to MetaCart
Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper discusses new approaches to automatically assigning attributes to files, including several forms of context analysis, which has been highly successful in the Google web search engine. With extensions like application hints (e.g., web links for downloaded files) and inter-file relationships, it should be possible to infer useful attributes for many files, making attribute-based search tools more effective.
Source Level Transformations to Improve I/O Data Partitioning
- In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os
, 2003
"... parallelism by providing multiple, independent data channels between processors and disks. To realize this goal, I/O streams need to be parallelized and partitioned at multiple system layers. Contention at any level can dramatically decrease performance and limit scalability. To address this disk co ..."
Abstract
- Add to MetaCart
parallelism by providing multiple, independent data channels between processors and disks. To realize this goal, I/O streams need to be parallelized and partitioned at multiple system layers. Contention at any level can dramatically decrease performance and limit scalability. To address this disk contention bottleneck, it is important to carefully study disk access patterns.

