Results 1 - 10
of
222
Basic concepts and taxonomy of dependable and secure computing
- IEEE TDSC
, 2004
"... Abstract—This paper gives the main definitions relating to dependability, a generic concept including as special case such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Bas ..."
Abstract
-
Cited by 315 (5 self)
- Add to MetaCart
Abstract—This paper gives the main definitions relating to dependability, a generic concept including as special case such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to th ..."
Abstract
-
Cited by 209 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
The Timed Asynchronous Distributed System Model
, 1999
"... We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a netwo ..."
Abstract
-
Cited by 153 (18 self)
- Add to MetaCart
We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a network of workstations. We also give an explanation of why practically needed services, such as consensus or leader election, which are not implementable in the time-free model, are implementable in the timed asynchronous system model.
Reaching Agreement on Processor Group Membership in Synchronous Distributed Systems
- Distributed Computing
, 1991
"... Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subj ..."
Abstract
-
Cited by 125 (14 self)
- Add to MetaCart
Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays. Key words: Communication network -- Distributed system -- Failure detection -- Fault tolerance -- Real time system -- Replicated data 1 Introduction When designing a computing service that must remain available despite component failures, a key idea is to replicate service state information at several servers running on distinct processors. The service state typically consists of the ser...
Handling Obstacles in Goal-Oriented Requirements Engineering
, 2000
"... Requirements engineering is concerned with the elicitation of high-level goals to be achieved by the envisioned system, the refinement of such goals and their operationalization into specifications of services and constraints, and the assignment of responsibilities for the resulting requirements ..."
Abstract
-
Cited by 112 (22 self)
- Add to MetaCart
Requirements engineering is concerned with the elicitation of high-level goals to be achieved by the envisioned system, the refinement of such goals and their operationalization into specifications of services and constraints, and the assignment of responsibilities for the resulting requirements to agents such as humans, devices, and software. Requirements engineering processes often result in goals, requirements and assumptions about agent behavior that are too ideal; some of them are likely to be not satisfied from time to time in the running system due to unexpected agent behavior. The lack of anticipation of exceptional behaviors results in unrealistic, unachievable and/or incomplete requirements. As a consequence, the software developed from those requirements will not be robust enough and will inevitably result in poor performance or failures, sometimes with critical consequences on the environment. The paper presents formal techniques for reasoning about obstacl...
Closure and Convergence: A Foundation of Fault-Tolerant Computing
- IEEE Transactions on Software Engineering
, 1993
"... We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the ..."
Abstract
-
Cited by 103 (28 self)
- Add to MetaCart
We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the system state remains within that larger set (Closure). And two, if faults stop occurring, the system eventually reaches a state within the legal set (Convergence). We demonstrate the applicability of our definition for specifying and verifying the fault-tolerance properties of a variety of digital and computer systems. Further, using the definition, we obtain a simple classification of fault-tolerant systems and discuss methods for their systematic design. as traditionally been studied in the context of specifi...
Chameleon: A Software Infrastructure For Adaptive Fault Tolerance In Distributed Systems
, 1998
"... The project has benefited through some demonstrations and presentations given to people from the computer industry, and the discussions they generated. Of them, mention must be made of Pankaj Mehra of Tandem Labs, Roger Lee, Robert Ferraro and Jagdish Patel of the Jet Propulsion Laboratory, Larry Ja ..."
Abstract
-
Cited by 80 (11 self)
- Add to MetaCart
The project has benefited through some demonstrations and presentations given to people from the computer industry, and the discussions they generated. Of them, mention must be made of Pankaj Mehra of Tandem Labs, Roger Lee, Robert Ferraro and Jagdish Patel of the Jet Propulsion Laboratory, Larry Jack and Chet Markiewicz of Honeywell Inc., and Haim Levendel of Motorola. iii TABLE OF CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 RELATED WORK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 3 CHAMELEON OVERVIEW : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 3.1 Behavioral Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 3.1.1 Initialization of the Chameleon Environment : : : : : : : : : : : : : : : : 15 3.1.2 Interpreting User-Specified Depen
On Transactional Workflows
- IEEE Data Engineering Bulletin
"... this paper may be "long running" or not. Other related terms used in the database literature are task flow, multitransaction activities [7], multi-system applications [1], application multiactivities, and networked applications [4]. Some related issues are also addressed in various relaxed transacti ..."
Abstract
-
Cited by 73 (9 self)
- Add to MetaCart
this paper may be "long running" or not. Other related terms used in the database literature are task flow, multitransaction activities [7], multi-system applications [1], application multiactivities, and networked applications [4]. Some related issues are also addressed in various relaxed transaction models.
Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments
- ACM Computing Surveys
, 1999
"... Fault tolerance in distributed computing is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. This paper aims at structuring the area and thus guiding readers into this interesting field. We use a formal approach to define important terms like f ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
Fault tolerance in distributed computing is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. This paper aims at structuring the area and thus guiding readers into this interesting field. We use a formal approach to define important terms like fault, fault tolerance, and redundancy. This leads to four distinct forms of fault tolerance and to two main phases in achieving them: detection and correction. We show that this can help to reveal inherently fundamental structures that contribute to understanding and unifying methods and terminology. By doing this, we survey many existing methodologies and discuss their relations. The underlying system model is the close-to-reality asynchronous message-passing model of distributed computing.

