Results 1 - 10
of
10
Write Off-Loading: Practical Power Management for Enterprise Storage
"... In enterprise data centers power usage is a problem impacting server density and the total cost of ownership. Storage uses a significant fraction of the power budget and there are no widely deployed power-saving solutions for enterprise storage systems. The traditional view is that enterprise worklo ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
In enterprise data centers power usage is a problem impacting server density and the total cost of ownership. Storage uses a significant fraction of the power budget and there are no widely deployed power-saving solutions for enterprise storage systems. The traditional view is that enterprise workloads make spinning disks down ineffective because idle periods are too short. We analyzed block-level traces from 36 volumes in an enterprise data center for one week and concluded that significant idle periods exist, and that they can be further increased by modifying the read/write patterns using write off-loading. Write off-loading allows write requests on spun-down disks to be temporarily redirected to persistent storage elsewhere in the data center. The key challenge is doing this transparently and efficiently at the block level, without sacrificing consistency or failure resilience. We describe our write offloading design and implementation that achieves these goals. We evaluate it by replaying portions of our traces on a rack-based testbed. Results show that just spinning disks down when idle saves 28–36 % of energy, and write off-loading further increases the savings to 45–60%. 1
A nine year study of file system and storage benchmarking
- ACM Transactions on Storage
, 2008
"... Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2–5 % wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
RAIF: Redundant Array of Independent Filesystems
"... Storage virtualization and data management are well known problems for individual users as well as large organizations. Existing storage-virtualization systems either do not support a complete set of possible storage types, do not provide flexible data-placement policies, or do not support per-file ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Storage virtualization and data management are well known problems for individual users as well as large organizations. Existing storage-virtualization systems either do not support a complete set of possible storage types, do not provide flexible data-placement policies, or do not support per-file conversion (e.g., encryption). This results in suboptimal utilization of resources, inconvenience, low reliability, and poor performance. We have designed a stackable file system called Redundant Array of Independent Filesystems (RAIF). It combines the data survivability and performance benefits of traditional RAID with the flexibility of composition and ease of development of stackable file systems. RAIF can be mounted on top of directories and thus on top of any combination of network, distributed, disk-based, and memory-based file systems. Individual files can be replicated, striped, or stored with erasure-correction coding on any subset of the underlying file systems. RAIF has similar performance to RAID. In configurations with parity, RAIF’s write performance is better than the performance of driver-level and even entry-level hardware RAID systems. This is because RAIF has better control over the data and parity caching. 1
Towards an I/O tracing framework taxonomy
- In Proceedings of the 2nd international Workshop on Petascale Data Storage, Nov 2007
"... There is high demand for I/O tracing in High Performance Computing (HPC). It enables in-depth analysis of distributed applications and file system performance tuning. It also aids distributed application debugging. Finally, it facilitates collaboration within and between government, industrial, and ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
There is high demand for I/O tracing in High Performance Computing (HPC). It enables in-depth analysis of distributed applications and file system performance tuning. It also aids distributed application debugging. Finally, it facilitates collaboration within and between government, industrial, and academic institutions by enabling the generation of replayable I/O traces, which can be easily distributed and anonymized as necessary to protect confidential or sensitive information. As a response to this demand for tracing tools, various means of I/O trace generation exist. We first survey the I/O Tracing Framework landscape, exploring three popular such frameworks: LANL-Trace [3], Tracefs [1], and //TRACE 1 [2]. We next develop an I/O Tracing Framework taxonomy. The purpose of this taxonomy is to assist I/O Tracing Framework users in formalizing their tracing requirements, and to provide the developers of I/O Tracing Frameworks a language to categorize the functionality and performance of them. The taxonomy categorizes I/O Tracing Framework features such as the type of data captured, trace replayability, and anonymization. The taxonomy also considers elapsed-time overhead and performance overhead. Finally, we provide a case study in the use of our new taxonomy, revisiting all three I/O Tracing Frameworks explored in our survey, to formally classify the features of each.
Unit Testing Non-functional Concerns of Component-based Distributed Systems
"... Unit testing component-based distributed systems traditionally involved testing functional concerns of the application logic throughout the development lifecycle. In contrast, testing non-functional distributed system concerns (e.g., end-to-end response time, security, and reliability) typically has ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Unit testing component-based distributed systems traditionally involved testing functional concerns of the application logic throughout the development lifecycle. In contrast, testing non-functional distributed system concerns (e.g., end-to-end response time, security, and reliability) typically has not occurred until system integration because it requires both a complete system to perform such tests and sophisticated techniques to identify and analyze performance metrics that constitute non-functional concerns. Moreover, in a agile development environment, unit testing non-functional concerns is even harder due to the disconnect between high-level system specification and low-level performance metrics. This paper provides three contributions to research on testing techniques for component-based distributed systems, which is manifested in a technique called Understanding Non-functional Intentions via Testing and Experimentation (UNITE). First, we show how UNITE allows developers to extract arbitrary metrics from log messages using highlevel constructs, such as a human readable expressions that identify variable data. Second, we show how UNITE preserves data integrity and system traces without requiring a globally unique identifier for context identification. Third, we show how developers can formulate equations that represent unit tests of non-functional concerns and then use UNITE to evaluate the equation using metrics extracted from the log messages. The results from applying UNITE to a representative project show that we can unit test nonfunctional properties of a component-based distributed system during the early stages of system development. 1
Detailed Analysis of I/O traces for large scale applications
"... Abstract- In this paper, we present a tool to extract I/O traces from very large applications running at full scale during their production runs. We analyze these traces to gain information about the application. We analyze the traces of three applications. The analysis showed that the I/O traces re ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract- In this paper, we present a tool to extract I/O traces from very large applications running at full scale during their production runs. We analyze these traces to gain information about the application. We analyze the traces of three applications. The analysis showed that the I/O traces reveal much information about the application even without access to the source code. In particular, these I/O traces provide multiple indications towards the algorithmic nature of the application by observing the changes of data amount and I/O request distribution at the checkpoints. Adaptive Mesh Refinement (AMR) is one of the kind of algorithms that can exhibit such I/O behavior. This is the first study of I/O characteristics of unbalanced AMR-supported applications at scale. The key observations that we made in the trace were (1) Variation in aggregate data sizes across checkpoints for AMR and non-AMR applications, (2) Variation in the number of I/O calls by a client depending on the nature of the application, (3) Use of temporary files by applications and possible erroneous calls to I/O functions, (4) Variation in average data transfer size according as whether the application has AMR support or not, (5) Aggregation of I/O for processes executing on a single physical node through MPI-IO calls, and (6) Updates to specific data structures in the checkpoint file. Keywords:Large scale I/O tracing, I/O trace analysis, adaptive mesh refinement I.
Towards a Secure and Efficient System for End-to-End Provenance
- APPEARS IN THE PROCEEDINGS OF THE SECOND USENIX WORKSHOP ON THEORY AND PRACTICE OF PROVENANCE (TAPP 2010)
, 2010
"... Work on the End-to-End Provenance System (EEPS) began in the late summer of 2009. The EEPS effort seeks to explore the three central questions in provenance systems: (1) “Where and how do I design secure hostlevel provenance collecting instruments (called provenance monitors)?”; (2) “How do I extend ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Work on the End-to-End Provenance System (EEPS) began in the late summer of 2009. The EEPS effort seeks to explore the three central questions in provenance systems: (1) “Where and how do I design secure hostlevel provenance collecting instruments (called provenance monitors)?”; (2) “How do I extend completeness and accuracy guarantees to distributed systems and computations?”; and (3) “What are the costs associated with provenance collection? ” This position paper discusses our initial exploration into these issues and posits several challenges to the realization of the EEPS vision.
Using Dataflow Models to Evaluate Enterprise Distributed Real-time and Embedded System Quality-of-Service
"... The effort required to evaluate enterprise distribute real-time and embedded (DRE) system qualityof-service (QoS) attributes (such as response-time, latency, and scalability) depends heavily on system complexity and size. As these systems increase in complexity and size, therefore, DRE system develo ..."
Abstract
- Add to MetaCart
The effort required to evaluate enterprise distribute real-time and embedded (DRE) system qualityof-service (QoS) attributes (such as response-time, latency, and scalability) depends heavily on system complexity and size. As these systems increase in complexity and size, therefore, DRE system developers and testers need improved methods and tools that facilitate QoS evaluation. This article describes a method and tool called Understanding Non-functional Intentions via Testing and Experimentation (UNITE) that evaluates enterprise DRE system QoS attributes using dataflow models to capture how data move through an enterprise DRE system. Empirical results show that although UNITE’s evaluation times depend on the size of the dataflow model, they depend even more on the size of the dataset processed by the dataflow model. Copyright c ○ 2009 John Wiley & Sons, Ltd. Received X August 2009 KEY WORDS: enterprise DRE systems; dataflow models; quality-of-service evaluation; early system integration testing; system execution traces; relational database theory 1.
Live Migration of User Environments Across Wide Area Networks
"... A complex challenge in mobile computing is to allow the user to migrate her highly customised environment while moving to a different location and to continue work without interruption. I motivate why this is a highly desirable capability and conduct a survey of the current approaches towards this g ..."
Abstract
- Add to MetaCart
A complex challenge in mobile computing is to allow the user to migrate her highly customised environment while moving to a different location and to continue work without interruption. I motivate why this is a highly desirable capability and conduct a survey of the current approaches towards this goal and explain their limitations. I then propose a new architecture to support user mobility by live migration of a user’s operating system instance over the network. Previous work includes the Collective and Internet Suspend/Resume projects that have addressed migration of a user’s environment by suspending the running state and resuming it at a later time. In contrast to previous work, this work addresses live migration of a user’s operating system instance across wide area links. Live migration is done by performing most of the migration while the operating system is still running, achieving very little downtime and preserving all network connectivity. I developed an initial proof of concept of this solution. It relies on migrating whole operating systems using the Xen virtual machine and provides a way to perform live migration of persistent storage as well as the network connections across subnets. These
A Secure and Efficient End-to-End Provenance System (EEPS)
"... Work on the End-to-End Provenance System (EEPS) was begun in the later summer of 2009. The EEPS effort seeks to explore the three central questions in provenance systems; a) ”where and how do I design secure hostlevel provenance collecting instruments (called provenance monitors)?”, b) ”how do I ext ..."
Abstract
- Add to MetaCart
Work on the End-to-End Provenance System (EEPS) was begun in the later summer of 2009. The EEPS effort seeks to explore the three central questions in provenance systems; a) ”where and how do I design secure hostlevel provenance collecting instruments (called provenance monitors)?”, b) ”how do I extend completeness and accuracy guarantees to distributed systems and computations?”, and c) ”what are the costs associated with provenance collection”. This paper discusses our initial exploration into these issues and posits several challenges to the realization of the EEPS vision. 1

