Results 1 - 10
of
19
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software
- IEEE Transactions on Software Engineering
, 1993
"... This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applie ..."
Abstract
-
Cited by 103 (1 self)
- Add to MetaCart
This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applied to life-critical software. Reliability growth models are examined and also shown to be incapable of overcoming the need for excessive amounts of testing. The key assumption of software fault tolerance---separately programmed versions fail independently---is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation. Index Terms---Life-Critical, Validation, Software Reliability, Design Error, Ultrareliability, Software Fault-Tolerance 1 Introducti...
The Infeasibility of Experimental Quantification of Life-Critical Software Reliability
- IEEE Transactions on Software Engineering
, 1991
"... This paper affirms that quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or faulttolerant software. The key assumption of software fault tolerance---separately programmed versions fail independently---is shown to be pro ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
This paper affirms that quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or faulttolerant software. The key assumption of software fault tolerance---separately programmed versions fail independently---is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation. Index Terms: LIFE-CRITICAL, VALIDATION, SOFTWARE RELIABILITY, DESIGN ERROR, ULTRARELIABILITY, SOFTWARE FAULT-TOLERANCE, 1 Introduction The potential of enhanced flexibility and functionality has led to an ever increasing use of digital computer systems in control applications. At first, the digital systems were designed to perform the same functions as their analog counterparts. However, the availability of en...
Monitoring, Testing, and Debugging of Distributed Real-Time Systems
, 2000
"... Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical comput ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical computer based systems are real-time systems, and the majority of current testing and debugging techniques have been developed for sequential (non real-time) programs. These techniques are not directly applicable to real-time systems, since they disregard issues of timing and concurrency. This means that existing techniques for reproducible testing and debugging cannot be used. Reproducibility is essential for regression testing and cyclic debugging, where the same test cases are run repeatedly with the intention of verifying modified program code or to track down errors. The current trend of consumer and industrial applications goes from single microcontrollers to sets of distributed micro-controllers, which are even more challenging than handling real-time per-see, since multiple loci of observation and control additionally must be considered. In this thesis we try to remedy these problems by presenting an integrated approach to monitoring, testing, and debugging of distributed real-time systems. For monitoring
Modelling the Effects of Combining Diverse Software Fault Detection Techniques
, 2000
"... The software engineering literature contains many studies of the efficacy of fault finding techniques. Few of these, however, consider what happens when several different techniques are used together. We show that the effectiveness of such multi-technique approaches depends upon quite subtle inte ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
The software engineering literature contains many studies of the efficacy of fault finding techniques. Few of these, however, consider what happens when several different techniques are used together. We show that the effectiveness of such multi-technique approaches depends upon quite subtle interplay between their individual efficacies and dependence between them. The modelling tool we use to study this problem is closely related to earlier work on software design diversity. The earliest of these results showed that, under quite plausible assumptions, it would be unreasonable even to expect software versions that were developed `truly independently' to fail independently of one another. The key idea here was a `difficulty function' over the input space. Later work extended these ideas to introduce a notion of `forced' diversity, in which it became possible to obtain system failure behaviour better even than could be expected if the versions failed independently. In this paper we show that many of these results for design diversity have counterparts in diverse fault detection in a single software version.
Software Fault Tolerance: A Tutorial
, 2000
"... Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the l ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the lead center for NASA's scientific and technical information. The NASA STI Program Office provides access to the NASA STI Database, the largest collection of aeronautical and space science STI in the world. The Program Office is also NASA's institutional mechanism for disseminating the results of its research and development activities. These results are published by NASA in the NASA STI Report Series, which includes the following report types: TECHNICAL PUBLICATION. Reports of completed research or a major significant phase of research that present the results of NASA programs and include extensive data or theoretical analysis. Includes compilations of significant scientific and technical data and information deemed to be of continuing reference value. NASA counterpart of peer-reviewed formal professional papers, but having less stringent limitations on manuscript length and extent of graphic presentations. TECHNICAL MEMORANDUM. Scientific and technical findings that are preliminary or of specialized interest, e.g., quick release reports, working papers, and bibliographies that contain minimal annotation. Does not contain extensive analysis.
System Support for Software Fault Tolerance in Highly Available Database Management Systems
, 1992
"... Today, software errors are the leading cause of outages in fault tolerant systems. System availability can be improved despite software errors by fast error detection and recovery techniques that minimize total downtime after an outage. This dissertation analyzes software errors in three commercial ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Today, software errors are the leading cause of outages in fault tolerant systems. System availability can be improved despite software errors by fast error detection and recovery techniques that minimize total downtime after an outage. This dissertation analyzes software errors in three commercial systems and describes the implementation and evaluation of several techniques for early error detection and fast recovery in a database management system (DBMS). The software error study examines errors reported by customers in three IBM systems programs: the MVS operating system and the IMS DBMS and DB2 DBMS. The study classifies errors by the type of coding mistake and the circumstances in the customer's environment that caused the error to arise. It observes a higher availability impact from addressing errors, such as uninitialized pointers, than software errors as a whole. It also details the frequencies and types of addressing errors and characterizes the damage they do. The error detec...
A Discussion of Practices for Enhancing Diversity in Software Designs
, 2000
"... This report discusses the practices which have been used or recommended for increasing the degree of diversity between redundant implementations of software or software-based systems. Its purpose is to give useful indications for designers, project managers and safety/reliability assessors in decidi ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This report discusses the practices which have been used or recommended for increasing the degree of diversity between redundant implementations of software or software-based systems. Its purpose is to give useful indications for designers, project managers and safety/reliability assessors in deciding about how great an advantage should be expected from the use of these practices, in absolute and in comparative terms. Existing knowledge does not allow one to state any strong general recommendations, but it is possible to improve on the intuitive justifications usually given for these various practices. This report clarifies the ways the various practices are conjectured to aid system reliability, the factors that should affect their efficacy, and thus, for a practitioner, the aspects of a specific project situation that need to be considered to inform decisions. Thus this report
Design for Deterministic Monitoring of Distributed Real-Time Systems
, 2000
"... In order to test, or debug, a system we must observe its run-time behavior and deem how well the observations comply with the system requirements. There are two significant differences between debugging and testing of software for desktop computers and embedded real-time systems: (1) It is more diff ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In order to test, or debug, a system we must observe its run-time behavior and deem how well the observations comply with the system requirements. There are two significant differences between debugging and testing of software for desktop computers and embedded real-time systems: (1) It is more difficult to observe embedded computer systems, simply because they are embedded, and that they thus have very few interfaces to the outside world, and (2) the actual act of observing a real-time systems or distributed real-time system can change their behavior. Monitoring of sequential software is straightforward, but for distributed realtime systems it is more complicated, since race conditions with respect to order of access to shared resources occur naturally. Any intrusive observation, or probing, of the distributed real-time system affects the timing and consequently the outcome of the races. In this paper we present a method for deterministic observations of single tasking, multi-taskin...
Performability Modeling of N Version Programming Technique
, 1995
"... This paper presents a detailed, but efficiently solvable model of the N version programming for evaluating the reliability and performability over a mission period. Employing a hierarchical decomposition we reduce the model complexity and provide a modeling framework for evaluating the NVP failure a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a detailed, but efficiently solvable model of the N version programming for evaluating the reliability and performability over a mission period. Employing a hierarchical decomposition we reduce the model complexity and provide a modeling framework for evaluating the NVP failure and execution time behavior and the operational environment, as well. The failure and execution rates are treated as random variables and the operational profile is analyzed on microstructure level, looking at probabilities of occurrence, failure and execution rates for each partition of input space. The reliability submodel, that represents per run behavior of NVP, includes both functional failures and timing failures thus resulting in system reliability which accounts for performance requirements. The successive runs are modeled by the performance submodel, that represents the iterative nature of software's execution. Combining the results of both submodels, we assess the performability over a mission period that represents the collective effect of multiple system attributes on the NVP effectiveness.
An Empirical Evaluation of Consensus Voting and Consensus Recovery Block Reliability in the Presence of Failure Correlation
- Journal of Computer and Software Engineering
, 1993
"... The reliability of fault-tolerant software system implementations, based on Consensus Voting and Consensus Recovery Block strategies, is evaluated using a set of independently developed functionally equivalent versions of an avionics application. The strategies are studied under conditions of high i ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The reliability of fault-tolerant software system implementations, based on Consensus Voting and Consensus Recovery Block strategies, is evaluated using a set of independently developed functionally equivalent versions of an avionics application. The strategies are studied under conditions of high inter-version failure correlation, and with program versions of medium-to-high reliability. Comparisons are made with classical N-Version Programming that uses Majority Voting, and with Recovery Block strategies. The empirical behavior of the three schemes is found to be in good agreement with theoretical analyses and expectations. In this study Consensus Voting and Consensus Recovery Block based systems were found to perform better, and more uniformly, than corresponding traditional strategies, that is, Recovery Block and N-Version Programming that use Majority Voting. This is the first experimental evaluation of the system reliability provided by Consensus Voting, and the first experimental...

