Results 1 -
5 of
5
On the Emulation of Software Faults by Software Fault Injection
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS
, 2000
"... This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations o ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations of Xception (and other SWIFI tools) in the emulation of different classes of software faults (about 44% of the software faults cannot be emulated). The use of field data about real faults was discussed and software metrics were suggested as an alternative to guide the injection process when field data is not available. In a second experiment, a set of rules for the injection of errors meant to emulate classes of software faults was evaluated. The fault triggers used seem to be the cause for the observed strong impact of the faults in the target system and in the program results. The results also show the influence in the fault emulation of aspects such as code size, complexity of data structures, and recursive versus sequential execution.
FITS - A Fault Injection Architecture for Time-Triggered Systems
"... Time-triggered systems require a very high degree of temporal accuracy at critical stages during run time. While many software fault injection environments exist today, none of these make provisions to meet the timing requirements of such systems. This paper introduces a fault injection environment ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Time-triggered systems require a very high degree of temporal accuracy at critical stages during run time. While many software fault injection environments exist today, none of these make provisions to meet the timing requirements of such systems. This paper introduces a fault injection environment for time-triggered systems. We describe the architecture of FITS and how it addresses the requirements of temporal accuracy in the time-triggered paradigm. An implementation of FITS...
Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
, 2006
"... Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a highbandwidth, low-latency interconnect. There are many benefits t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a highbandwidth, low-latency interconnect. There are many benefits to using clusters for computation, but also some drawbacks, including a tendency to exhibit low Mean Time To Failure (MTTF) due to the sheer number of components involved. Recently, a number of fault-tolerance techniques have been proposed and developed to mitigate the inherent unreliability of clusters. These techniques, however, fail to address the issue of detecting non-obvious faults, particularly Byzantine faults. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework.
Cluster Survivability with ByzwATCh: A Byzantine Hardware Fault Detector for Parallel Machines with Charm++
, 2006
"... Modern high-performance computing relies heavily on the use of commodity processors arranged together in clusters. These clusters consist of individual nodes (typically off-the-shelf single or dual processor machines) connected together with a high speed interconnect. Using cluster computation has m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Modern high-performance computing relies heavily on the use of commodity processors arranged together in clusters. These clusters consist of individual nodes (typically off-the-shelf single or dual processor machines) connected together with a high speed interconnect. Using cluster computation has many benefits, but also carries the liability of being failure prone due to the sheer number of components involved. Many effective solutions have been proposed to aid failure recovery in clusters, however, they depend on these failures being detectable. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting byzantine hardware errors as part of the Charm++ parallel programming framework.
Duration: 36m
, 2001
"... Sponsor: Microsoft (UK) Project funded by the European Community under the “Information Society Technology” Programme (1998-2002) Table of Content Abstract.................................................................................................. 1 1 Introduction.............................. ..."
Abstract
- Add to MetaCart
Sponsor: Microsoft (UK) Project funded by the European Community under the “Information Society Technology” Programme (1998-2002) Table of Content Abstract.................................................................................................. 1 1 Introduction.......................................................................................... 2

