Results 1 - 10
of
360
Practical Byzantine Fault Tolerance
"... This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbi ..."
Abstract
-
Cited by 476 (20 self)
- Add to MetaCart
This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3 % slower than a standard unreplicated NFS.
The dangers of replication and a solution
- In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data
, 1996
"... Abstract: Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traflc gives a thousand fold increase in deadlocks or reconciliations. Master copy replica-tion (primary copyj schemes reduce this problem. A simple an ..."
Abstract
-
Cited by 445 (3 self)
- Add to MetaCart
Abstract: Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traflc gives a thousand fold increase in deadlocks or reconciliations. Master copy replica-tion (primary copyj schemes reduce this problem. A simple analytic model demonstrates these results. A new two-tier replication algorithm is proposed that allows mobile (disconnected) applications to propose tentative update trans-actions that are later applied to a master copy. Commutative update transactions avoid the instability of other replication schemes. 1.
Nested Transactions: An Approach to Reliable Distributed Computing
, 1981
"... Distributed computing systems are being built and used more and more frequently. This distributod computing revolution makes the reliability of distributed systems an important concern. It is fairly well-understood how to connect hardware so that most components can continue to work when others are ..."
Abstract
-
Cited by 435 (1 self)
- Add to MetaCart
Distributed computing systems are being built and used more and more frequently. This distributod computing revolution makes the reliability of distributed systems an important concern. It is fairly well-understood how to connect hardware so that most components can continue to work when others are broken, and thus increase the reliability of a system as a whole. This report addressos the issue of providing software for reliable distributed systems. In particular, we examine how to program a system so that the software continues to work in tho face of a variety of failures of parts of the system. The design presented
Small Byzantine Quorum Systems
- DISTRIBUTED COMPUTING
, 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract
-
Cited by 366 (48 self)
- Add to MetaCart
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
Exploiting virtual synchrony in distributed systems
, 1987
"... Abstract: We describe applications of a virtually synchro-nous environment for distributed programming, which underlies a collection of distributed programming tools in the 1SIS2 system. A virtually synchronous environment allows processes to be structured into process groups, and makes events like ..."
Abstract
-
Cited by 298 (26 self)
- Add to MetaCart
Abstract: We describe applications of a virtually synchro-nous environment for distributed programming, which underlies a collection of distributed programming tools in the 1SIS2 system. A virtually synchronous environment allows processes to be structured into process groups, and makes events like broadcasts to the group as an entity, group membership changes, and even migration of an activity from one place to another appear to occur instan-taneously-- in other words, synchronously. A major advantage to this approach is that many aspects of a dis-tributed application can be treated independently without compromising correctness. Moreover, user code that is designed as if the system were synchronous can often be executed concurrently. We argue that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches. 1. A toolkit for distributed systems Consider the design of a distributed system for factory automation, say for VLSI chip fabrication. Such a system would need to group control 'processes into services responsible for different aspects of the fabrication procedure. One service might accept batches of chips needing photographic emulsions, another oversee transport of chips from station to sta-tion, etc. Within a service, algorithms would be needed for scheduling work, replicating data, coordi-
Practical Byzantine fault tolerance and proactive recovery
- ACM Transactions on Computer Systems
, 2002
"... Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, B ..."
Abstract
-
Cited by 248 (7 self)
- Add to MetaCart
Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2 % faster to 24 % slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to th ..."
Abstract
-
Cited by 209 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
IP-based Protocols for Mobile Internetworking
, 1991
"... We consider the problem of providing network access to hosts whose physical location changes with time. Such hosts cannot depend on traditional forms of network connectivity and routing because their location, and hence the route to reach them, cannot be deduced from their network address. In this p ..."
Abstract
-
Cited by 191 (4 self)
- Add to MetaCart
We consider the problem of providing network access to hosts whose physical location changes with time. Such hosts cannot depend on traditional forms of network connectivity and routing because their location, and hence the route to reach them, cannot be deduced from their network address. In this paper, we explore the concept of providing continuous network access to mobile computers, and present a set of IP-based protocols that achieve that goal. They are primarily targeted at supporting a campus environment with mobile computers, but also extend gracefully to accommodate hosts moving between different networks. The key feature is the dependence on ancillary machines, the Mobile Support Stations (MSSs), to track the location of the Mobile Hosts. Using a combination of caching, forwarding pointers, and timeouts, a minimal amount of state is kept in each MSS. The state information is kept in a distributed fashion; the system scales well, reacts quickly to changing topologies, and does ...
The Bayou Architecture: Support for Data Sharing among Mobile Users
, 1994
"... The Bayou System is a platform of replicated, highly-available, variable-consistency, mobile databases on which to build collaborative applications. This paper presents the preliminary system architecture along with the design goals that influenced it. We take a fresh, bottom-up and critical look at ..."
Abstract
-
Cited by 183 (2 self)
- Add to MetaCart
The Bayou System is a platform of replicated, highly-available, variable-consistency, mobile databases on which to build collaborative applications. This paper presents the preliminary system architecture along with the design goals that influenced it. We take a fresh, bottom-up and critical look at the requirements of mobile computing applications and carefully pull together both new and existing techniques into an overall architecture that meets these requirements. Our emphasis is on supporting application-specific conflict detection and resolution and on providing application-controlled inconsistency.
Time Synchronization in Wireless Sensor Networks
, 2003
"... OF THE DISSERTATION University of California, Los Angeles, 2003 Professor Deborah L. Estrin, Chair active research in large-scale networks of small, wireless, low-power sensors and actuators. Time synchronization is a critical piece of infrastructure in any dis- tributed system, but wirel ..."
Abstract
-
Cited by 174 (12 self)
- Add to MetaCart
OF THE DISSERTATION University of California, Los Angeles, 2003 Professor Deborah L. Estrin, Chair active research in large-scale networks of small, wireless, low-power sensors and actuators. Time synchronization is a critical piece of infrastructure in any dis- tributed system, but wireless sensor networks make particularly extensive use physical world. However, while the clock accuracy and precision requirements are often stricter in sensor networks than in traditional distributed systems, energy and channel constraints limit the resources available to meet these goals.

