Results 1 - 10
of
22
Improving the reliability of commodity operating systems
, 2003
"... drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85 % of recently reported failures. This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. T ..."
Abstract
-
Cited by 192 (14 self)
- Add to MetaCart
drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85 % of recently reported failures. This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to the existing driver and system code. Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver’s use of kernel resources to facilitate automatic cleanup during recovery. To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. Under a wide range and number of fault conditions, we show that Nooks recovers automatically from 99 % of the faults that otherwise cause Linux to crash.
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
- IEEE Transactions on Software Engineering
, 1998
"... An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This paper presents a new software implemented fault injection and monitoring environment, called Xception, which is targeted for the modern and complex processors. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject quite realistic faults by software, and to monitor the activation of the faults and their impact on the target system behavior in detail. Faults are injected with minimum interference with the target application. The target application is not modified, no software traps are inserted, and it is not necessary to execute the target application in special trace mode (the application is executed at full speed). Xception provides a comprehensive set of fault triggers, including spatial and temporal fault triggers, and triggers related to the manipulation of data in memory. Faults injected by Xception can affect any process running on the target system (including the kernel), and it is possible to inject faults in applications for which the source code is not available. Experimental results are presented to demonstrate the accuracy and potential of Xception in the evaluation of the dependability properties of the complex computer systems available nowadays.
SafeDrive: Safe and recoverable extensions using language-based techniques
- In OSDI’06
, 2006
"... We present SafeDrive, a system for detecting and recovering from type safety violations in software extensions. SafeDrive has low overhead and requires minimal changes to existing source code. To achieve this result, SafeDrive uses a novel type system that provides finegrained isolation for existing ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
We present SafeDrive, a system for detecting and recovering from type safety violations in software extensions. SafeDrive has low overhead and requires minimal changes to existing source code. To achieve this result, SafeDrive uses a novel type system that provides finegrained isolation for existing extensions written in C. In addition, SafeDrive tracks invariants using simple wrappers for the host system API and restores them when recovering from a violation. This approach achieves finegrained memory error detection and recovery with few code changes and at a significantly lower performance cost than existing solutions based on hardware-enforced domains, such as Nooks [33], L4 [21], and Xen [13], or software-enforced domains, such as SFI [35]. The principles used in SafeDrive can be applied to any large system with loadable, error-prone extension modules. In this paper we describe our experience using SafeDrive for protection and recovery of a variety of Linux device drivers. In order to apply SafeDrive to these device drivers, we had to change less than 4 % of the source code. SafeDrive recovered from all 44 crashes due to injected faults in a network card driver. In experiments with 6 different drivers, we observed increases in kernel CPU utilization of 4–23 % with no noticeable degradation in end-to-end performance. 1
On the Emulation of Software Faults by Software Fault Injection
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS
, 2000
"... This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations o ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations of Xception (and other SWIFI tools) in the emulation of different classes of software faults (about 44% of the software faults cannot be emulated). The use of field data about real faults was discussed and software metrics were suggested as an alternative to guide the injection process when field data is not available. In a second experiment, a set of rules for the injection of errors meant to emulate classes of software faults was evaluated. The fault triggers used seem to be the cause for the observed strong impact of the faults in the target system and in the program results. The results also show the influence in the fault emulation of aspects such as code size, complexity of data structures, and recursive versus sequential execution.
Fast Byte-Granularity Software Fault Isolation
"... Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot isolate exist ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot isolate existing kernel extensions with low overhead on standard hardware. This is a hard problem because these extensions communicate with the kernel using a complex interface and they communicate frequently. We present BGI (Byte-Granularity Isolation), a new software fault isolation technique that addresses this problem. BGI uses efficient byte-granularity memory protection to isolate kernel extensions in separate protection domains that share the same address space. BGI ensures type safety for kernel objects and it can detect common types of errors inside domains. Our results show that BGI is practical: it can isolate Windows drivers without requiring changes to the source code and it introduces a CPU overhead between 0 and 16%. BGI can also find bugs during driver testing. We found 28 new bugs in widely used Windows drivers.
A Dependability Benchmark for OLTP Application Environments
- IN PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES (VLDB
, 2003
"... The ascendance of networked information in our economy and daily lives has increased the awareness of the importance of dependability features. OLTP (On-Line Transaction Processing) systems constitute the kernel of the information systems used today to support the daily operations of most of t ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The ascendance of networked information in our economy and daily lives has increased the awareness of the importance of dependability features. OLTP (On-Line Transaction Processing) systems constitute the kernel of the information systems used today to support the daily operations of most of the business. Although these systems comprise the best examples of complex business-critical systems, no practical way has been proposed so far to characterize the impact of faults in such systems or to compare alternative solutions concerning dependability features. This paper proposes a dependability benchmark for OLTP systems. This dependability benchmark uses the workload of the TPC-C performance benchmark and specifies the measures and all the steps required to evaluate both the performance and key dependability features of OLTP systems, with emphasis on availability. This dependability benchmark is presented through a concrete example of benchmarking the performance and dependability of several different transactional systems configurations.
Using attack injection to discover new vulnerabilities
- In Proceedings of the International Conference on Dependable Systems and Networks
, 2006
"... Due to our increasing reliance on computer systems, security incidents and their causes are important problems that need to be addressed. To contribute to this objective, the paper describes a new tool for the discovery of security vulnerabilities on network connected servers. The AJECT tool uses a ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Due to our increasing reliance on computer systems, security incidents and their causes are important problems that need to be addressed. To contribute to this objective, the paper describes a new tool for the discovery of security vulnerabilities on network connected servers. The AJECT tool uses a specification of the server’s communication protocol to automatically generate a large number of attacks accordingly to some predefined test classes. Then, while it performs these attacks through the network, it monitors the behavior of the server both from a client perspective and inside the target machine. The observation of an incorrect behavior indicates a successful attack and the potential existence of a vulnerability. To demonstrate the usefulness of this approach, a considerable number of experiments were carried out with several IMAP servers. The results show that AJECT can discover several kinds of vulnerabilities, including a previously unknown vulnerability. 1.
Failure Resilience for Device Drivers
"... Studies have shown that device drivers and extensions contain 3–7 times more bugs than other operating system code and thus are more likely to fail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components—primarily through mo ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other operating system code and thus are more likely to fail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components—primarily through monitoring and replacing malfunctioning components on the fly—transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of the components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our design using performance measurements, software fault-injection experiments, and an analysis of the reengineering effort.
Fault Isolation for Device Drivers
"... This work explores the principles and practice of isolating low-level device drivers in order to improve OS dependability. In particular, we explore the operations drivers can perform and how fault propagation in the event a bug is triggered can be prevented. We have prototyped our ideas in an open- ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This work explores the principles and practice of isolating low-level device drivers in order to improve OS dependability. In particular, we explore the operations drivers can perform and how fault propagation in the event a bug is triggered can be prevented. We have prototyped our ideas in an open-source multiserver OS (MINIX 3) that isolates drivers by strictly enforcing least authority and iteratively refined our isolation techniques using a pragmatic approach based on extensive software-implemented fault-injection (SWIFI) testing. In the end, out of 3,400,000 common faults injected randomly into 4 different Ethernet drivers using both programmed I/O and DMA, no fault was able to break our protection mechanisms and crash the OS. In total, we experienced only one hang, but this appears to be caused by buggy hardware.
On the Selection of Error Model(s) for OS Robustness Evaluation
- In Proc. of DSN
, 2007
"... The choice of error model used for robustness evaluation of Operating Systems (OSs) influences the evaluation run time, implementation complexity, as well as the evaluation precision. In order to find an “effective ” error model for OS evaluation, this paper systematically compares the relative effe ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The choice of error model used for robustness evaluation of Operating Systems (OSs) influences the evaluation run time, implementation complexity, as well as the evaluation precision. In order to find an “effective ” error model for OS evaluation, this paper systematically compares the relative effectiveness of three prominent error models, namely bit-flips, data type errors and fuzzing errors using fault injection at the interface between device drivers OS. Bit-flips come with higher costs (time) than the other models, but allow for more detailed results. Fuzzing is cheaper to implement but is found to be less precise. A composite error model is presented where the low cost of fuzzing is combined with the higher level of details of bit-flips, resulting in high precision with moderate setup and execution costs. 1.

