Results 1 - 10
of
75,925
Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
- ACM COMPUTING SURVEYS
, 1990
"... The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models--Byzantine and fail-stop. System reconfiguration techniques for removing faulty components and i ..."
Abstract
-
Cited by 972 (10 self)
- Add to MetaCart
The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models--Byzantine and fail-stop. System reconfiguration techniques for removing faulty components
Bayeux: An architecture for scalable and fault-tolerant wide-area data dissemination
, 2001
"... The demand for streaming multimedia applications is growing at an incredible rate. In this paper, we propose Bayeux, an efficient application-level multicast system that scales to arbitrarily large receiver groups while tolerating failures in routers and network links. Bayeux also includes specific ..."
Abstract
-
Cited by 466 (12 self)
- Add to MetaCart
mechanisms for load-balancing across replicate root nodes and more efficient bandwidth consumption. Our simulation results indicate that Bayeux maintains these properties while keeping transmission overhead low. To achieve these properties, Bayeux leverages the architecture of Tapestry, a fault-tolerant
Understanding Fault-Tolerant Distributed Systems
- COMMUNICATIONS OF THE ACM
, 1993
"... We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design ..."
Abstract
-
Cited by 374 (23 self)
- Add to MetaCart
We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design
Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS
- IEEE Transactions on Software Engineering
, 1995
"... Abstract-- PVS is the most recent in a series of verification systems developed at SRI. Its design was strongly influenced, and later refined, by our experiences in developing formal specifications and mechanically checked verifications for the fault-tolerant architecture, algorithms, and implementa ..."
Abstract
-
Cited by 333 (47 self)
- Add to MetaCart
Abstract-- PVS is the most recent in a series of verification systems developed at SRI. Its design was strongly influenced, and later refined, by our experiences in developing formal specifications and mechanically checked verifications for the fault-tolerant architecture, algorithms
Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems
, 1983
"... This paper was originally submitted to ACM Transactions on Programming Languages and Systems. The responsible editor was Susan L. Graham. The authors and editor kindly agreed to transfer the paper to the ACM Transactions on Computer Systems ..."
Abstract
-
Cited by 352 (18 self)
- Add to MetaCart
This paper was originally submitted to ACM Transactions on Programming Languages and Systems. The responsible editor was Susan L. Graham. The authors and editor kindly agreed to transfer the paper to the ACM Transactions on Computer Systems
Practical Byzantine fault tolerance and proactive recovery
- ACM Transactions on Computer Systems
, 2002
"... Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, B ..."
Abstract
-
Cited by 418 (9 self)
- Add to MetaCart
, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet
Reliable Communication in the Presence of Failures
- ACM Transactions on Computer Systems
, 1987
"... The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols at ..."
Abstract
-
Cited by 556 (20 self)
- Add to MetaCart
The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols
Basic concepts and taxonomy of dependable and secure computing
- IEEE TDSC
, 2004
"... This paper gives the main definitions relating to dependability, a generic concept including as special case such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic defin ..."
Abstract
-
Cited by 758 (6 self)
- Add to MetaCart
definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting
Design and Evaluation of a Wide-Area Event Notification Service
- ACM Transactions on Computer Systems
"... This paper presents SIENA, an event notification service that we have designed and implemented to exhibit both expressiveness and scalability. We describe the service's interface to applications, the algorithms used by networks of servers to select and deliver event notifications, and the strat ..."
Abstract
-
Cited by 789 (32 self)
- Add to MetaCart
, and the strategies used Effort sponsored by the Defense Advanced Research Projects Agency, and Air Force Research Laboratory, Air Force Materiel Command,USAF, under agreement numbers F30602-94-C-0253, F3060297 -2-0021, F30602-98-2-0163, F30602-99-C-0174, F30602-00-2-0608, and N66001-00-8945; by the Air Force Office
Results 1 - 10
of
75,925