Results 1 - 10
of
54
The Rio File Cache: Surviving Operating System Crashes
- In Proc. 7th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 1996
"... Abstract: One of the fundamental limits to high-performance, high-reliability file systems is memory’s vulnerability to system crashes. Because memory is viewed as unsafe, systems periodically write data back to disk. The extra disk traffic lowers performance, and the delay period before data is saf ..."
Abstract
-
Cited by 105 (13 self)
- Add to MetaCart
Abstract: One of the fundamental limits to high-performance, high-reliability file systems is memory’s vulnerability to system crashes. Because memory is viewed as unsafe, systems periodically write data back to disk. The extra disk traffic lowers performance, and the delay period before data is safe lowers reliability. The goal of the Rio (RAM I/O) file cache is to make ordinary main memory safe for persistent storage by enabling memory to survive operating system crashes. Reliable memory enables a system to achieve the best of both worlds: reliability equivalent to a write-through file cache, where every write is instantly safe, and performance equivalent to a pure write-back cache, with no reliability-induced writes to disk. To achieve reliability, we protect memory during a crash and restore it during a reboot (a “warm ” reboot). Extensive crash tests show that even without protection, warm reboot enables memory to achieve reliability close to that of a write-through file system while performing 20 times faster. Rio makes all writes immediately permanent, yet performs faster than systems that lose 30 seconds of data on a crash: 35% faster than a standard delayed-write file system and 8 % faster than a system that delays both data and metadata. For applications that demand even higher levels of reliability, Rio’s optional protection mechanism makes memory even safer than a write-through file system while while lowering performance 20 % compared to a pure write-back system. 1
IRON file systems
- In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP ’05
, 2005
"... IRON FILE SYSTEMSVijayan Prabhakaran Disk drives are widely used as a primary medium for storing information.While commodity file systems trust disks to either work or fail completely, modern disks exhibit complex failure modes such as latent sector faults and block corrup-tions, where only portions ..."
Abstract
-
Cited by 74 (24 self)
- Add to MetaCart
IRON FILE SYSTEMSVijayan Prabhakaran Disk drives are widely used as a primary medium for storing information.While commodity file systems trust disks to either work or fail completely, modern disks exhibit complex failure modes such as latent sector faults and block corrup-tions, where only portions of a disk fail.
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
- IEEE Transactions on Software Engineering
, 1998
"... An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This paper presents a new software implemented fault injection and monitoring environment, called Xception, which is targeted for the modern and complex processors. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject quite realistic faults by software, and to monitor the activation of the faults and their impact on the target system behavior in detail. Faults are injected with minimum interference with the target application. The target application is not modified, no software traps are inserted, and it is not necessary to execute the target application in special trace mode (the application is executed at full speed). Xception provides a comprehensive set of fault triggers, including spatial and temporal fault triggers, and triggers related to the manipulation of data in memory. Faults injected by Xception can affect any process running on the target system (including the kernel), and it is possible to inject faults in applications for which the source code is not available. Experimental results are presented to demonstrate the accuracy and potential of Xception in the evaluation of the dependability properties of the complex computer systems available nowadays.
Xception: Software Fault Injection and Monitoring in Processor Functional Units
, 1995
"... This paper presents Xception, a software fault injection and monitoring environment. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject more realistic faults by software, and to monitor the activation of the faults and their i ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
This paper presents Xception, a software fault injection and monitoring environment. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject more realistic faults by software, and to monitor the activation of the faults and their impact on the target system behaviour in detail. Faults are injected with minimum interference with the target application. The target application is not modified, no software traps are inserted, and it is not necessary to execute it in special trace mode (the application is executed at full speed). Xception provides a comprehensive set of fault triggers, including spatial and temporal fault triggers, and triggers related to the manipulation of data in memory. Faults injected by Xception can affect any process running on the target system including the operating system. Sets of faults can be defined by the user according to several criteria, including the emulation of faults in specific target processor functional units. Presently, Xception has been implemented on a parallel machine build around the PowerPC 601 processor running the PARIX operating system. Experiment results are presented showing the impact of faults on several parallel applications running on a commercial parallel system. It is shown that up to 73% of the faults, depending on the processor functional unit affected, can cause the application to produce wrong results. The results show that the impact of faults heavily depends on the application and the specific processor functional unit affected by the fault.
Dependability of COTS Microkernel-Based Systems
- IEEE Transactions on Computers
, 2002
"... AbstractÐThe commercial offer concerning microkernel technology constitutes an attractive alternative for developing operating systems to suit a wide range of application domains. However, the integration of COTS microkernels into critical embedded computer systems is a problem for system developers ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
AbstractÐThe commercial offer concerning microkernel technology constitutes an attractive alternative for developing operating systems to suit a wide range of application domains. However, the integration of COTS microkernels into critical embedded computer systems is a problem for system developers, in particular due to the lack of objective data concerning their behavior in the presence of faults. This paper addresses this issue by describing a prototype environment �MAFALDA: Microkernel Assessment by Fault injection AnaLysis and Design Aid) that is aimed at providing objective failure data on a candidate microkernel and also improving its error detection capabilities. The paper first presents the overall architecture of MAFALDA. Then, a case study carried out on an instance of the Chorus microkernel is used to illustrate the benefits that can be obtained with MAFALDA both from the dependability assessment and design-aid viewpoints. Implementation issues are also addressed that account for the specific API of the target microkernel. Some overall insights and lessons learned, gained during the various studies conducted on both Chorus and another target microkernel �LynxOS), are then depicted and discussed. Finally, we conclude the paper by summarizing the main features of the work presented and by identifying future research. Index TermsÐCOTS microkernels, dependability characterization, fault injection, error confinement, wrapping. 1
Assessment of COTS Microkernels by Fault Injection
- Seventh Internatinal Working Conference on Dependable Computing
, 1999
"... Abstract: This paper addresses the problem of using COTS microkernels in safety critical systems. As the behavior in the presence of faults of such basic components is seldom established, it is questionable whether they can be used to develop operating systems for critical applications. The approach ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Abstract: This paper addresses the problem of using COTS microkernels in safety critical systems. As the behavior in the presence of faults of such basic components is seldom established, it is questionable whether they can be used to develop operating systems for critical applications. The approach proposed for the assessment of a COTS microkernel relies on fault injection as a means to obtain objective insights for the provision of upper layer services. A specific tool (MAFALDA) has been developed to implement this approach. We present and discuss the results obtained when applying the tool to the Chorus ClassiX r3 microkernel. Finally, some lessons learnt from these experiments and plans for future work are described. 1
The Systematic Improvement of Fault Tolerance in the Rio File Cache
- In Proceedings of the 1999 Symposium on Fault-Tolerant Computing
, 1999
"... : Fault injection is typically used to characterize failures and to validate and compare fault-tolerant mechanisms. However, fault injection is rarely used for all these purposes to guide the design and implementation of a fault-tolerant system. We present a systematic and quantitative approach for ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
: Fault injection is typically used to characterize failures and to validate and compare fault-tolerant mechanisms. However, fault injection is rarely used for all these purposes to guide the design and implementation of a fault-tolerant system. We present a systematic and quantitative approach for using software-implemented fault injection to guide the design and implementation of a fault-tolerant system. Our system design goal is to build a write-back file cache on Intel PCs that is as reliable as a write-through file cache. We follow an iterative approach to improve robustness in the presence of operating system errors. In each iteration, we measure the reliability of the system, analyze the fault symptoms that lead to data corruption, and apply fault-tolerant mechanisms that address the fault symptoms. Our initial system is 13 times less reliable than a writethrough file cache. The result of several iterations is a design that is both more reliable (1.9% vs. 3.1% corruption rate) a...
Measuring Software Dependability by Robustness Benchmarking
- IEEE TRANSACTIONS OF SOFTWARE ENGINEERING
, 1994
"... Inability to identify weaknesses or to quantify advancements in software system robustness frequently hinders the development of robust software systems. Efforts have been made to develop benchmarks of software robustness to address this problem, but they all suffer from significant shortcomings. Th ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Inability to identify weaknesses or to quantify advancements in software system robustness frequently hinders the development of robust software systems. Efforts have been made to develop benchmarks of software robustness to address this problem, but they all suffer from significant shortcomings. This paper presents the various features that are desirable in a benchmark of system robustness, and evaluates some existing benchmarks according to these features. A new hierarchically structured approach to building robustness benchmarks, which overcomes many deficiencies of past efforts, is also presented. This approach has been applied to building a hierarchically structured benchmark that tests part of the Unix file and virtual memory systems. The resultant benchmark has successfully been used to identify new response class stuctures that were not detected in a similar situation by other less organized techniques.
Software Fault Tolerance: A Tutorial
, 2000
"... Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the l ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the lead center for NASA's scientific and technical information. The NASA STI Program Office provides access to the NASA STI Database, the largest collection of aeronautical and space science STI in the world. The Program Office is also NASA's institutional mechanism for disseminating the results of its research and development activities. These results are published by NASA in the NASA STI Report Series, which includes the following report types: TECHNICAL PUBLICATION. Reports of completed research or a major significant phase of research that present the results of NASA programs and include extensive data or theoretical analysis. Includes compilations of significant scientific and technical data and information deemed to be of continuing reference value. NASA counterpart of peer-reviewed formal professional papers, but having less stringent limitations on manuscript length and extent of graphic presentations. TECHNICAL MEMORANDUM. Scientific and technical findings that are preliminary or of specialized interest, e.g., quick release reports, working papers, and bibliographies that contain minimal annotation. Does not contain extensive analysis.
Testing for Software Vulnerability Using Environment Perturbation
- in Proceeding of the International Conference on Dependable Systems and Networks (DSN 2000), Workshop On Dependability Versus Malicious Faults
, 2000
"... We describe an methodology for testing a software system for possible security flaws. Based on the observation that most security flaws are caused by the program's inappropriate interactions with the environment, and triggered by user's malicious perturbation on the environment (which we call an env ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
We describe an methodology for testing a software system for possible security flaws. Based on the observation that most security flaws are caused by the program's inappropriate interactions with the environment, and triggered by user's malicious perturbation on the environment (which we call an environment fault), we view the security testing problem as the problem of testing for the fault-tolerance properties of a software system. We consider each environment perturbation as a fault and the resulting security compromise a failure in the toleration of such faults. Our approach is based on the well known technique of fault-injection. Environment faults are injected into the system under test and system behavior observed. The failure to tolerate faults is an indicator of a potential security flaw in the system. An Environment-Application Interaction (EAI) fault model is proposed which guides us to decide what faults to inject. Based on EAI, we have developed a security testing methodology, and apply it to several applications. We successfully identified a number of vulnerabilities include vulnerabilities in Windows NT operating system.

