Results 1 - 10
of
128
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 807 (17 self)
- Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Crowds: Anonymity for Web Transactions
- ACM Transactions on Information and System Security
, 1997
"... this paper we introduce a system called Crowds for protecting users' anonymity on the worldwide -web. Crowds, named for the notion of "blending into a crowd", operates by grouping users into a large and geographically diverse group (crowd) that collectively issues requests on behalf of its members. ..."
Abstract
-
Cited by 565 (12 self)
- Add to MetaCart
this paper we introduce a system called Crowds for protecting users' anonymity on the worldwide -web. Crowds, named for the notion of "blending into a crowd", operates by grouping users into a large and geographically diverse group (crowd) that collectively issues requests on behalf of its members. Web servers are unable to learn the true source of a request because it is equally likely to have originated from any member of the crowd, and even collaborating crowd members cannot distinguish the originator of a request from a member who is merely forwarding the request on behalf of another. We describe the design, implementation, security, performance, and scalability of our system. Our security analysis introduces degrees of anonymity as an important tool for describing and proving anonymity properties.
Lightweight causal and atomic group multicast
- ACM Transactions on Computer Systems
, 1991
"... (DoD) under DARPA/NASA subcontract NAG2-593 administered by the NASA ..."
Abstract
-
Cited by 542 (44 self)
- Add to MetaCart
(DoD) under DARPA/NASA subcontract NAG2-593 administered by the NASA
The process group approach to reliable distributed computing
- Communications of the ACM
, 1993
"... The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, ..."
Abstract
-
Cited by 501 (16 self)
- Add to MetaCart
The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation, and achieve high reliability. This paper reviews six years of resemr,.hon Isis, describing the model, its impl_nentation challenges, and the types of applicatiom to which Isis has been appfied. 1 In oducfion One might expect the reliability of a distributed system to follow directly from the reliability of its con-stituents, but this is not always the case. The mechanisms used to structure a distributed system and to implement cooperation between components play a vital role in determining how reliable the system will be. Many contemporary distributed operating systems have placed emphasis on communication performance, overlooking the need for tools to integrate components into a reliable whole. The communication primitives supported give generally reliable behavior, but exhibit problematic semantics when transient failures or system configuration changes occur. The resulting building blocks are, therefore, unsuitable for facilitating the construction of systems where reliability is impo/tant. This paper reviews six years of research on Isis, a syg_,,m that provides tools _ support the construction of reliable distributed software. The thesis underlying l._lS is that development of reliable distributed software can be simplified using process groups and group programming too/_. This paper motivates the approach taken, surveys the system, and discusses our experience with real applications.
Serverless Network File Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1995
"... In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the sy ..."
Abstract
-
Cited by 403 (26 self)
- Add to MetaCart
In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client.
The Weakest Failure Detector for Solving Consensus
, 1996
"... We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficien ..."
Abstract
-
Cited by 374 (19 self)
- Add to MetaCart
We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as 3W. Thus, 3W is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
Group Communication Specifications: A Comprehensive Study
- ACM Computing Surveys
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 284 (12 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to th ..."
Abstract
-
Cited by 209 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
Newtop: A Fault-Tolerant Group Communication Protocol
, 1995
"... : A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large, and processes could be communicating over the Internet. Asynchronous communication environment is therefore assumed whe ..."
Abstract
-
Cited by 146 (21 self)
- Add to MetaCart
: A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large, and processes could be communicating over the Internet. Asynchronous communication environment is therefore assumed where message transmission times cannot be accurately estimated, and the underlying network may well get partitioned, preventing functioning processes from communicating with each other. Newtop can provide causality preserving total order delivery to members of a group, ensuring that total order delivery is preserved for multi-group processes. Both symmetric and asymmetric order protocols are supported, permitting a process to use say symmetric version in one group and asymmetric version in other. Key words: group communication, group membership, fault tolerance, network protocol, multicast protocol, causal order, total order. 1. Introduction Many fault-tolerant distributed applications can ...

