Results 1 - 10
of
11
Modelling Software Design Diversity: A Review
- ACM Computing Surveys
, 1999
"... Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. Whilst there is clear evidence that the approach can be expected to deliver some increase in reliability compared with a single version, there is not agreement about th ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. Whilst there is clear evidence that the approach can be expected to deliver some increase in reliability compared with a single version, there is not agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown not to be tenable: assessment of the actual level of dependence present is therefore needed, and this is hard. In this tutorial we survey the modelling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault tolerant systems. The intended audience is one of designers, assessors and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity.
On Performability Modeling and Evaluation of Software Fault Tolerance Structures
- in Proc. EDCC1
, 1994
"... Abstract. An adaptive scheme for software fault-tolerance is evaluated from the point of view of performability, comparing it with previously published analyses of the more popular schemes, recovery blocks and multiple version programming. In the case considered, this adaptive scheme, "Self-Con ..."
Abstract
-
Cited by 13 (13 self)
- Add to MetaCart
Abstract. An adaptive scheme for software fault-tolerance is evaluated from the point of view of performability, comparing it with previously published analyses of the more popular schemes, recovery blocks and multiple version programming. In the case considered, this adaptive scheme, "Self-Configuring Optimistic Programming " (SCOP), is equivalent to N-version programming in terms of the probability of delivering correct results, but achieves better performance by delaying the execution of some of the variants until it is made necessary by an error. A discussion follows highlighting the limits in the realism of these analyses, due to the assumptions made to obtain mathematically tractable models, to the lack of experimental data and to the need to consider also resource consumption in the definition of the models. We consider ways of improving usability of the results of comparative evaluation for guiding design decisions. 1
A Comparative Analysis of Hardware and Software Fault Tolerance: Impact on Software Reliability Engineering
, 1999
"... this paper, we focus on methods of fault tolerance, and investigate the differences between hardware fault tolerance and software fault tolerance. 1.2 Fault, Error and Failure ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
this paper, we focus on methods of fault tolerance, and investigate the differences between hardware fault tolerance and software fault tolerance. 1.2 Fault, Error and Failure
A Discussion of Practices for Enhancing Diversity in Software Designs
, 2000
"... This report discusses the practices which have been used or recommended for increasing the degree of diversity between redundant implementations of software or software-based systems. Its purpose is to give useful indications for designers, project managers and safety/reliability assessors in decidi ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This report discusses the practices which have been used or recommended for increasing the degree of diversity between redundant implementations of software or software-based systems. Its purpose is to give useful indications for designers, project managers and safety/reliability assessors in deciding about how great an advantage should be expected from the use of these practices, in absolute and in comparative terms. Existing knowledge does not allow one to state any strong general recommendations, but it is possible to improve on the intuitive justifications usually given for these various practices. This report clarifies the ways the various practices are conjectured to aid system reliability, the factors that should affect their efficacy, and thus, for a practitioner, the aspects of a specific project situation that need to be considered to inform decisions. Thus this report
Dependability Models for Iterative Software Considering Correlation between Successive Inputs
- IN PROC. IEEE INT. CONFERENCE ON PERFORMANCE AND DEPENDABILITY
, 1995
"... We consider the dependability of programs of an iterative nature. The dependability of software structures is usually analysed using models that are strongly limited in their realism by the assumptions made to obtain mathematically tractable models and by the lack of experimental data. The assumptio ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We consider the dependability of programs of an iterative nature. The dependability of software structures is usually analysed using models that are strongly limited in their realism by the assumptions made to obtain mathematically tractable models and by the lack of experimental data. The assumption of independence between the outcomes of successive executions, which is often false, may lead to significant deviations from the real behaviour of the program under analysis. In this work we present a model in which dependencies among input values of successive iterations are taken into account in studying the dependability of iterative software. We consider also the possibility that repeated, non fatal failures may together cause mission failure. We evaluate the effects of these different hypotheses on 1) the probability of completing a fixed-duration mission, and 2) a performability measure.
On Reducing the Sensitivity of Software Reliability to Variations in the Operational Profile
, 1996
"... In the statistical sampling method, as in any other statistical approaches for measuring software reliability, the inputs to the program are chosen according to the estimated probability with which they occur in field use, forming the operational profile. However, in practice it is very difficult to ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In the statistical sampling method, as in any other statistical approaches for measuring software reliability, the inputs to the program are chosen according to the estimated probability with which they occur in field use, forming the operational profile. However, in practice it is very difficult to accurately assess the operational distribution of input points. Furthermore, a variety of factors can cause the operational distribution to change during field use making the estimation even more difficult. Musa has suggested that reducing the size of the input domain simplifies the task of determining operational profiles. In this paper, we present a class of techniques that reduce the dimensionality of input domains and describe their application. These techniques do not limit the functionality or change the input-output behavior of the program. An additional benefit of these techniques is the insensitivity of the reliability estimate to variations in the operational profile of variables ...
Conceptual Models for the Reliability of Diverse Systems - New Results
- in Proc. 28th International Symposium on Fault-Tolerant Computing, FTCS-28
, 1998
"... We address problems in modelling the reliability of multiple-version software, and present models intended to improve the understanding of the various ways failure dependence between versions can arise. The previous models, by Eckhardt and Lee and by Littlewood and Miller, described what behaviour c ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We address problems in modelling the reliability of multiple-version software, and present models intended to improve the understanding of the various ways failure dependence between versions can arise. The previous models, by Eckhardt and Lee and by Littlewood and Miller, described what behaviour could be expected "on average" from a randomly chosen pair of "independently generated" versions. Instead, we address the problem of predicting the reliability of a specific pair of versions. The concept of "variation of difficulty" between situations to which software may be subject is central to the previous models cited. We show that it has even more far-reaching implications than previously found. In particular, we consider the practical implications of two phenomena: varying probabilities of failure over input sub-domains or operating regimes; and positive correlation between successive executions of control software. Our analysis provides some practical advice for regulators, and useful...
The Reliability of Diverse Systems: a Contribution using Modelling of the Fault Creation Process
- Process,” in DSN'01, The International Conference on Dependable Systems and Networks, July 01 - 04, 2001, Goteborg
, 1999
"... Design diversity is a protection against design faults causing common-mode failure in redundant systems. Although we know that it is effective, we badly lack knowledge about how much reliability it will buy in practice, and thus its cost-effectiveness, in which cases it is an appropriate solution an ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Design diversity is a protection against design faults causing common-mode failure in redundant systems. Although we know that it is effective, we badly lack knowledge about how much reliability it will buy in practice, and thus its cost-effectiveness, in which cases it is an appropriate solution and how it should be taken into account by safety assessor and regulators. Both current practice and the scientific debate about design diversity depend largely on intuition about how the little hard empirical knowledge available should be extrapolated. We show a way of making this activity more scientific by substituting a detailed probabilistic model for broad-brush intuition. Simple assumptions on the process of fault creation in two separately-developed versions yield interesting conclusions about two questions that are commonly debated: what degree of reliability improvement in a redundant system an assessor can reliably expect from diversity; and whether this reliability improvement increases or decreases with higher-quality development processes. For instance, we show how software reliability assessments based on current practice for single-version software should be consistently extended to assessing a 1-out-of-2, twoversion system. 1.
Modelling Correlation Among Successive Inputs in Software Dependability Analyses
, 1994
"... We consider the dependability of programs of an iterative nature. The dependability of software structures is usually analysed using models that are strongly limited in their realism by the assumptions made to obtain mathematically tractable models and by the lack of experimental data. Among the ass ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We consider the dependability of programs of an iterative nature. The dependability of software structures is usually analysed using models that are strongly limited in their realism by the assumptions made to obtain mathematically tractable models and by the lack of experimental data. Among the assumptions made, the independence between the outcomes of successive executions, which is often false, may lead to significant deviations of the result obtained from the real behaviour of the program under analysis. Experiments and theoretical justifications show the existence of contiguous failure regions in the program input space and that, for many applications, the inputs often follow a trajectory of contiguous points in the input space. In this work we present a model in which dependencies among input values of successive iterations are taken into account in studying the dependability of iterative software. We consider also the possibility that repeated, non fatal failures may together cause mission failure. We evaluate the effects of these different hypotheses on 1) the probability of completing a fixed-duration mission, and 2) a performability measure.
Measurement and Analysis of Operating System Fault-tolerance
- IEEE Transactions on Reliability
, 1993
"... Thispaperdemonstratesamethodology tomodel and evaluatethefaultolerancecharacteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Thispaperdemonstratesamethodology tomodel and evaluatethefaultolerancecharacteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigat-ing basic dependability characteristics such as major so _ problems and error distributions, we develop two leveis of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a dislributed environmenL Based oft the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the sys-tems. Software error correlation in multicomputer systems is also investigated. Results show that I/O management and program flow control are the major sources of software problems in the measured IBM/MVS and VAX/VMS operating systems, while memory management is the major source of software problems in the TandeJn/GUARDIAN operating system. Software errors tend to occur in bursts on both IBM and VAX machines. This phenomemm islesspronounced intheTandem system,which can be attributedto its fault-tolerant design. The fault tolerance in the Tandem system reduces the loss of service due to software

