Results 1 - 10
of
45
The Rio File Cache: Surviving Operating System Crashes
- In Proc. 7th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 1996
"... Abstract: One of the fundamental limits to high-performance, high-reliability file systems is memory’s vulnerability to system crashes. Because memory is viewed as unsafe, systems periodically write data back to disk. The extra disk traffic lowers performance, and the delay period before data is saf ..."
Abstract
-
Cited by 105 (13 self)
- Add to MetaCart
Abstract: One of the fundamental limits to high-performance, high-reliability file systems is memory’s vulnerability to system crashes. Because memory is viewed as unsafe, systems periodically write data back to disk. The extra disk traffic lowers performance, and the delay period before data is safe lowers reliability. The goal of the Rio (RAM I/O) file cache is to make ordinary main memory safe for persistent storage by enabling memory to survive operating system crashes. Reliable memory enables a system to achieve the best of both worlds: reliability equivalent to a write-through file cache, where every write is instantly safe, and performance equivalent to a pure write-back cache, with no reliability-induced writes to disk. To achieve reliability, we protect memory during a crash and restore it during a reboot (a “warm ” reboot). Extensive crash tests show that even without protection, warm reboot enables memory to achieve reliability close to that of a write-through file system while performing 20 times faster. Rio makes all writes immediately permanent, yet performs faster than systems that lose 30 seconds of data on a crash: 35% faster than a standard delayed-write file system and 8 % faster than a system that delays both data and metadata. For applications that demand even higher levels of reliability, Rio’s optional protection mechanism makes memory even safer than a write-through file system while while lowering performance 20 % compared to a pure write-back system. 1
IRON file systems
- In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP ’05
, 2005
"... IRON FILE SYSTEMSVijayan Prabhakaran Disk drives are widely used as a primary medium for storing information.While commodity file systems trust disks to either work or fail completely, modern disks exhibit complex failure modes such as latent sector faults and block corrup-tions, where only portions ..."
Abstract
-
Cited by 74 (24 self)
- Add to MetaCart
IRON FILE SYSTEMSVijayan Prabhakaran Disk drives are widely used as a primary medium for storing information.While commodity file systems trust disks to either work or fail completely, modern disks exhibit complex failure modes such as latent sector faults and block corrup-tions, where only portions of a disk fail.
The Exception Handling Effectiveness Of Posix Operating Systems
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2000
"... ..."
Packet Vaccine: Black-box Exploit Detection and Signature Generation
- In Proceedings of the 13th ACM CCS
, 2006
"... In biology, a vaccine isaweakenedstrainofavirusorbacterium that is intentionally injected into the body for the purpose of stimulating antibody production. Inspired by this idea, we propose a packet vaccine mechanism that randomizes address-like strings in packet payloads to carry out fast exploit d ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
In biology, a vaccine isaweakenedstrainofavirusorbacterium that is intentionally injected into the body for the purpose of stimulating antibody production. Inspired by this idea, we propose a packet vaccine mechanism that randomizes address-like strings in packet payloads to carry out fast exploit detection, vulnerability diagnosis and signature generation. An exploit with a randomized jump address behaves like a vaccine: it will likely cause an exception in a vulnerable program’s process when attempting to hijack the control flow, and thereby expose itself. Taking that exploit as a template, our signature generator creates a set of new vaccines to probe the program, in an attempt to uncover the necessary conditions for the exploit to happen. A signature is built upon these conditions to shield the underlying vulnerability from further attacks. In this way, packet vaccine detects and filters exploits in a black-box fashion, i.e., avoiding the expense of tracking the program’s execution flow. We present the design of the packet vaccine mechanism and an example of its application. We also describe our proof-of-concept implementation and the evaluation of our technique using real exploits.
The Systematic Improvement of Fault Tolerance in the Rio File Cache
- In Proceedings of the 1999 Symposium on Fault-Tolerant Computing
, 1999
"... : Fault injection is typically used to characterize failures and to validate and compare fault-tolerant mechanisms. However, fault injection is rarely used for all these purposes to guide the design and implementation of a fault-tolerant system. We present a systematic and quantitative approach for ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
: Fault injection is typically used to characterize failures and to validate and compare fault-tolerant mechanisms. However, fault injection is rarely used for all these purposes to guide the design and implementation of a fault-tolerant system. We present a systematic and quantitative approach for using software-implemented fault injection to guide the design and implementation of a fault-tolerant system. Our system design goal is to build a write-back file cache on Intel PCs that is as reliable as a write-through file cache. We follow an iterative approach to improve robustness in the presence of operating system errors. In each iteration, we measure the reliability of the system, analyze the fault symptoms that lead to data corruption, and apply fault-tolerant mechanisms that address the fault symptoms. Our initial system is 13 times less reliable than a writethrough file cache. The result of several iterations is a design that is both more reliable (1.9% vs. 3.1% corruption rate) a...
Measuring Software Dependability by Robustness Benchmarking
- IEEE TRANSACTIONS OF SOFTWARE ENGINEERING
, 1994
"... Inability to identify weaknesses or to quantify advancements in software system robustness frequently hinders the development of robust software systems. Efforts have been made to develop benchmarks of software robustness to address this problem, but they all suffer from significant shortcomings. Th ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Inability to identify weaknesses or to quantify advancements in software system robustness frequently hinders the development of robust software systems. Efforts have been made to develop benchmarks of software robustness to address this problem, but they all suffer from significant shortcomings. This paper presents the various features that are desirable in a benchmark of system robustness, and evaluates some existing benchmarks according to these features. A new hierarchically structured approach to building robustness benchmarks, which overcomes many deficiencies of past efforts, is also presented. This approach has been applied to building a hierarchically structured benchmark that tests part of the Unix file and virtual memory systems. The resultant benchmark has successfully been used to identify new response class stuctures that were not detected in a similar situation by other less organized techniques.
Issues in Testing Distributed Component-Based Systems
- In First International ICSE Workshop on Testing Distributed Component-Based Systems
, 1999
"... Issues in testing distributed component-based systems are discussed. Differences in testing such systems and other systems are identified. Several limitations and shortcomings of the existing test methodologies are also identified and a new methodology proposed. Keywords: CORBA, DCOM, component-b ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Issues in testing distributed component-based systems are discussed. Differences in testing such systems and other systems are identified. Several limitations and shortcomings of the existing test methodologies are also identified and a new methodology proposed. Keywords: CORBA, DCOM, component-based distributed systems, fault-tolerance, Java RMI, test adequacy, test methodology. 1 Introduction Testing software systems is a complex problem in itself. With the increasing trend in using distributed software, the task of testing becomes even more complicated. The scalability of testing methodologies and development of testing tools need to keep up with new technologies such as CORBA, DCOM and Java RMI. The process of testing is further complicated by the use of COTS components in the systems. Testers need to test the behavior of such components in systems even if the components have been tested before. Sometimes the components that are reused may not have been designed for systems ...
Comparison of Physical and Software-Implemented Fault Injection Techniques
- IEEE Trans. Comput
, 2003
"... This paper addresses the issue of characterizing the respective impact of fault injection techniques. Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS faulttolerant distributed real-time system are compared and ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper addresses the issue of characterizing the respective impact of fault injection techniques. Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS faulttolerant distributed real-time system are compared and analyzed. After a short summary of the fault tolerance features of the MARS architecture and especially of the error detection mechanisms that were used to compare the erroneous behaviors induced by the fault injection techniques considered, we describe the common distributed testbed and test scenario implemented to perform a coherent set of fault injection campaigns. The main features of the four fault injection techniques considered are then briefly described and the results obtained are finally presented and discussed. Emphasis is put on the analysis of the specific impact and merit of each injection technique.
UMLinux -- A Tool for Testing a Linux System's Fault Tolerance
- IN LINUXTAG 2002
, 2002
"... When setting up servers it would often be nice to know, how these systems will react to hardware-failures such as a defect harddisk, random access memory, network interface or simple power failure. Will data be lost or corrupted or will the system simply not be accessible for clients for some time? ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
When setting up servers it would often be nice to know, how these systems will react to hardware-failures such as a defect harddisk, random access memory, network interface or simple power failure. Will data be lost or corrupted or will the system simply not be accessible for clients for some time? The silent corruption of data without any error messages, for example, is a worst case scenario for database systems. It would be nice to be able to test, if a system designed to continue delivering services even in the presence of faults will indeed do so. We have implemented UMLinux, a User Mode Linux which can be used for realistic fault injection experiments to help answer the above questions. In order to be as close to reality as possible, our UMLinux implements kernel memory protection and runs the complete virtual machine (including operating system and all processes) as a single process on the (real) host. Of course, UMLinux is binary compatible with the host, so all binaries which run on the host also run on UMLinux (without recompilation). The system we want to examine is set up using virtual UMLinux machines. For most Linux-distributions we can use the out-of-the-box installation routine to install the virtual machine directly from cdrom. The virtual hardware, such as random access memory size, hard-, floppy- and cdrom-drives as well as network-interfaces can be configured freely within the limits posed by the resources available on the host. When the virtual server system is up and running, the fault injector can be configured to inject faults into the virtual hardware. We can currently inject bitflips into CPU-registers and main memory, defect bytes on any kind of block device and network send and receive failures. The whole setup is currently controlled via a ...
Toward a Scalable Method for Quantifying Aspects of Fault Tolerance, Software Assurance, and Computer Security
- Software Assurance, and Computer Security. Computer Security, Dependability, and Assurance: From Needs to Solutions (CSDA'98
, 1998
"... Quantitative assessment tools are urgently needed in the areas of fault tolerance, software assurance, and computer security. Assessment methods typically employed in various combinations are fault injection, formal verification, and testing. However, these methods are expensive because they are lab ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Quantitative assessment tools are urgently needed in the areas of fault tolerance, software assurance, and computer security. Assessment methods typically employed in various combinations are fault injection, formal verification, and testing. However, these methods are expensive because they are labor-intensive, with costs scaling at least linearly with the number of software modules tested. Additionally, they are subject to human lapses and oversights because they require two different representations for each system, and then base results on a direct or an indirect representation comparison.

