Results 1 - 10
of
71
Attested append-only memory: Making adversaries stick to their word
- In Proc. of SOSP
, 2007
"... Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more than a third of their replica population being Byzantine. In this paper, we investigate how minimal trusted abstractions can push through these hard limits in practical ways. We propose Attested Append-Only Memory (A2M), a trusted system facility that is small, easy to implement and easy to verify formally. A2M provides the programming abstraction of a trusted log, which leads to protocol designs immune to equivocation – the ability of a faulty host to lie in different ways to different clients or servers – which is a common source of Byzantine headaches. Using A2M, we improve upon the state of the art in Byzantine-fault tolerant replicated state machines, producing A2M-enabled protocols (variants of Castro and Liskov’s PBFT) that remain correct (linearizable) and keep making progress (live) even when half the replicas are faulty, in contrast to the previous upper bound. We also present an A2M-enabled single-server shared storage protocol that guarantees linearizability despite server faults. We implement A2M and our protocols, evaluate them experimentally through micro- and macro-benchmarks, and argue that the improved fault tolerance is cost-effective for a broad range of uses, opening up new avenues for practical, more reliable services.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
- In USENIX Annual Technical Conference
"... In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a replicated, centralized service. The interface exposed by Zoo-Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state. These design decisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used extensively by client applications. 1
Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults
"... This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems effectively unusable for many applications by reducing their throughput by two orders of magnitude or more, from thousands of requests per second to fewer than 10 requests per second. The problem comes not because these systems fail to meet the guarantees they promise, but because the guarantees they promise are insufficient for the high assurance systems for which BFT techniques are likely to be of most interest. In this paper, we describe Aardvark, a new BFT replication protocol that guarantees good performance during uncivil periods, when the network is reliable but when up to f servers and any number of clients are faulty. Aardvark gives up some performance compared to protocols that focus on optimizing for the best case, but Aardvark’s peak throughput of 40527 requests per second seems sufficient for many applications. Because Aardvark is less aggressively tuned for the fault free case, it is guaranteed to remain within a constant factor of 40527 when faults occur. We observe throughputs of between 11706 and 40527 for a broad range of injected faults.
TrInc: Small Trusted Hardware for Large Distributed Systems
"... A simple yet remarkably powerful tool of selfish and malicious participants in a distributed system is “equivocation”: making conflicting statements to others. We present TrInc, a small, trusted component that combats equivocation in large, distributed systems. Consisting fundamentally of only a non ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
A simple yet remarkably powerful tool of selfish and malicious participants in a distributed system is “equivocation”: making conflicting statements to others. We present TrInc, a small, trusted component that combats equivocation in large, distributed systems. Consisting fundamentally of only a non-decreasing counter and a key, TrInc provides a new primitive: unique, once-in-alifetime attestations. We show that TrInc is practical, versatile, and easily applicable to a wide range of distributed systems. Its deployment is viable because it is simple and because its fundamental components—a trusted counter and a key—are already deployed in many new personal computers today. We demonstrate TrInc’s versatility with three detailed case studies: attested append-only memory (A2M), PeerReview, and BitTorrent. We have implemented TrInc and our three case studies using real, currently available trusted hardware. Our evaluation shows that TrInc eliminates most of the trusted storage needed to implement A2M, significantly reduces communication overhead in PeerReview, and solves an open incentives issue in BitTorrent. Microbenchmarks of our TrInc implementation indicate directions for the design of future trusted hardware. 1
Byzantine replication under attack
- In Proceedings of the 38th IEEE/IFIP International Conference on Depe ndable Systems and Networks (DSN ’08
, 2008
"... Existing Byzantine-resilient replication protocols satisfy two standard correctness criteria, safety and liveness, in the presence of Byzantine faults. In practice, however, faulty processors can, in some protocols, significantly degrade performance by causing the system to make progress at an extre ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Existing Byzantine-resilient replication protocols satisfy two standard correctness criteria, safety and liveness, in the presence of Byzantine faults. In practice, however, faulty processors can, in some protocols, significantly degrade performance by causing the system to make progress at an extremely slow rate. While “correct ” in the traditional sense, systems vulnerable to such performance degradation are of limited practical use in adversarial environments. This paper argues that techniques for mitigating such performance attacks are needed to bridge this “practicality gap ” for intrusion-tolerant replication systems. We propose a new performance-oriented correctness criterion, and we show how failure to meet this criterion can lead to performance degradation. We present a new Byzantine replication protocol that achieves the criterion and evaluate its performance in fault-free configurations and when under attack.
BFT Protocols Under Fire
"... Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), with relatively little evaluation under typical, practical conditions (WAN delays, packet loss, transient disconnection, ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), with relatively little evaluation under typical, practical conditions (WAN delays, packet loss, transient disconnection, shared resources). This makes it difficult for system designers to choose the appropriate protocol for a real target deployment. Moreover, most protocol implementations differ in their choice of runtime environment, crypto library, and transport, hindering direct protocol comparisons even under similar conditions. We present a simulation environment for such protocols that combines a declarative networking system with a robust network simulator. Protocols can be rapidly implemented from pseudocode in the high-level declarative language of the former, while network conditions and (measured) costs of communication packages and crypto primitives can be plugged into the latter. We show that the resulting simulator faithfully predicts the performance of native protocol implementations, both as published and as measured in our local network. We use the simulator to compare representative protocols under identical conditions and rapidly explore the effects of changes in the costs of crypto operations, workloads, network conditions and faults. For example, we show that Zyzzyva outperforms protocols like PBFT and Q/U under most but not all conditions, indicating that one-size-fits-all protocols may be hard if not impossible to design in practice. 1
UpRight Cluster Services
- In Proc. of 22nd SOSP, Big Sky
, 2009
"... The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary concern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness.
Evita Raced: Metacompilation for Declarative Networks ABSTRACT
"... Declarative languages have recently been proposed for many new applications outside of traditional data management. Since these are relatively early research efforts, it is important that the architectures of these declarative systems be extensible, in order to accommodate unforeseen needs in these ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Declarative languages have recently been proposed for many new applications outside of traditional data management. Since these are relatively early research efforts, it is important that the architectures of these declarative systems be extensible, in order to accommodate unforeseen needs in these new domains. In this paper, we apply the lessons of declarative systems to the internals of a declarative engine. Specifically, we describe our design and implementation of Evita Raced, an extensible compiler for the OverLog language used in our declarative networking system, P2. Evita Raced is a metacompiler: an OverLog compiler written in OverLog. We describe the minimalist architecture of Evita Raced, including its extensibility interfaces and its reuse of P2’s data model and runtime engine. We demonstrate that a declarative language like OverLog is well-suited to expressing traditional and novel query optimizations as well as other query manipulations, in a compact and natural fashion. Finally, we present initial results of Evita Raced extended with various optimization programs, running on both Internet overlay networks and wireless sensor networks. 1.
Diverse replication for single-machine Byzantine-fault tolerance
- In Submission
, 2008
"... New single-machine environments are emerging from abundant computation available through multiple cores and secure virtualization. In this paper, we describe the research challenges and opportunities around diversified replication as a method to increase the Byzantine-fault tolerance (BFT) of single ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
New single-machine environments are emerging from abundant computation available through multiple cores and secure virtualization. In this paper, we describe the research challenges and opportunities around diversified replication as a method to increase the Byzantine-fault tolerance (BFT) of single-machine servers to software attacks or errors. We then discuss the design space of BFT protocols enabled by these new environments. 1
Refined quorum systems
- In Proceedings of the 26th annual ACM symposium on Principles of distributed computing
, 2007
"... Abstract. It is considered good distributed computing practice to devise object implementations that tolerate contention, periods of asynchrony and a large number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention. This paper initiates the first ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Abstract. It is considered good distributed computing practice to devise object implementations that tolerate contention, periods of asynchrony and a large number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention. This paper initiates the first study of quorum systems that help design such implementations by encompassing, at the same time, optimal resilience, as well as optimal best-case complexity. We introduce the notion of a refined quorum system (RQS) of some set S as a set of three classes of subsets (quorums) of S: first class quorums are also second class quorums, themselves being also third class quorums. First class quorums have large intersections with all other quorums, second class quorums typically have smaller intersections with those of the third class, the latter simply correspond to traditional quorums. Intuitively, under uncontended and synchronous conditions, a distributed object implementation would expedite an operation if a quorum of the first class is accessed, then degrade gracefully depending on whether a quorum of the second or the third class is accessed. Our notion of refined quorum system is devised assuming a general adversary structure, and this basically allows algorithms relying on refined quorum systems to relax the assumption of independent process failures, often questioned in practice.

