Results 1 - 10
of
17
FAB: Building Distributed Enterprise Disk Arrays from Commodity Components
, 2004
"... This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances con ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-votingbased algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional masterslave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.
Fault-scalable Byzantine fault-tolerant services
- In Proceedings of the 20th ACM Symposium on Operating Systems Principles
, 2005
"... A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U ..."
Abstract
-
Cited by 92 (6 self)
- Add to MetaCart
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36 % as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.
Decentralized Replicated-Object Protocols
, 1999
"... We describe a new replicated-object protocol designed for use in mobile and weakly-connected environments. The protocol differs from previous protocols in combining epidemic information propagation with voting, and in using fixed per-object currencies for voting. The advantage of epidemic protocols ..."
Abstract
-
Cited by 66 (9 self)
- Add to MetaCart
We describe a new replicated-object protocol designed for use in mobile and weakly-connected environments. The protocol differs from previous protocols in combining epidemic information propagation with voting, and in using fixed per-object currencies for voting. The advantage of epidemic protocols is that data movement only requires pairwise communication. Hence, there is no need for a majority quorum to be available and simultaneously connected at any single time. The protocols increase availability by using voting, rather than primary copy or primary commit schemes. Finally, the use of per-object voting currencies allows votes to take place in an entirely decentralized fashion, without any server having complete knowledge of group membership. We show that currency allocation can be used to implement diverse policies. For example, uniform currency distributions emulate traditional dynamic voting schemes, while allocating all currency to a single server emulates a primary-copy scheme...
Middle-R: Consistent Database Replication at the Middleware Level
- ACM Trans. Comput. Syst
, 2005
"... The widespread use of clusters and web farms has increased the importance of data replication. In this paper, we show how to implement consistent and scalable data replication at the middleware level. We do this by combining transactional concurrency control with group communication primitives. The ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
The widespread use of clusters and web farms has increased the importance of data replication. In this paper, we show how to implement consistent and scalable data replication at the middleware level. We do this by combining transactional concurrency control with group communication primitives. The paper presents different replication protocols, argues their correctness, describes their implementation as part of a generic middleware tool, and proves their feasibility with an extensive performance evaluation. The solution proposed is well suited for a variety of applications including web farms and distributed object platforms.
Processing Transactions over Optimistic Atomic Broadcast Protocols
, 1999
"... Atomic broadcast primitives allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new appro ..."
Abstract
-
Cited by 52 (20 self)
- Add to MetaCart
Atomic broadcast primitives allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery. In this paper, we develop this idea further and present a replicated database architecture that employs the new atomic broadcast primitive in such a way that the coordination phase of the atomic broadcast is fully overlapped with the execution of transactions, providing high performance without relaxing transaction correctness. 1. Introduction and Motivation Atomic Broadcast [6, 5] primitives are a well known mechanism to increase fault tolerance and to provide a semantically rich framework to develop distributed systems. Unfortunately, it is also recog...
An Architecture for Survivable Coordination in Large Distributed Systems
- IEEE Transactions on Knowledge and Data Engineering
, 2000
"... Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost, and when processes may participate only transiently or behave arbitrarily, e.g., after suffering a security breach. In this paper, we propose a scalable ar ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost, and when processes may participate only transiently or behave arbitrarily, e.g., after suffering a security breach. In this paper, we propose a scalable architecture to support coordination in such extreme conditions. Our architecture consists of a collection of persistent data servers that implement simple shared data abstractions for clients, without trusting the clients or even the servers themselves. We show that by interacting with these untrusted servers, clients can solve distributed consensus, a powerful and fundamental coordination primitive. Our architecture is very practical, and we describe the implementation of its main components in a system called Phalanx. 1 Introduction In this paper we propose a system architecture that supports efficient and scalable coordination among distributed clients. Our architecture strives to support...
Are Quorums an Alternative for Data Replication
- ACM TRANSACTIONS ON DATABASE SYSTEMS
, 2003
"... ... this article, we analyze several quorum types in order to better understand their behavior in practice. The results obtained challenge many of the assumptions behind quorum based replication. Our evaluation indicates that the conventional read-one/write-all-available approach is the best choice ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
... this article, we analyze several quorum types in order to better understand their behavior in practice. The results obtained challenge many of the assumptions behind quorum based replication. Our evaluation indicates that the conventional read-one/write-all-available approach is the best choice for a large range of applications requiring data replication. We believe this is an important result for anybody developing code for computing clusters as the read-one/write-all-available strategy is much simpler to implement and more flexible than quorum-based approaches. In this article, we show that, in addition, it is also the best choice using a number of other selection criteria
FAB: enterprise storage systems on a shoestring
- IN OPERATING SYSTEMS (LIHUE, HI, 18–21 MAY 2003
, 2003
"... A Federated Array of Bricks (FAB) is a logical disk system that provides the reliability and performance of enterprise-class disk arrays, at a fraction of the cost and with better scalability. The unit of deployment in FAB is a brick, a small rack-mounted storage appliance built from commodity compo ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
A Federated Array of Bricks (FAB) is a logical disk system that provides the reliability and performance of enterprise-class disk arrays, at a fraction of the cost and with better scalability. The unit of deployment in FAB is a brick, a small rack-mounted storage appliance built from commodity components including disks, a CPU, NVRAM, and network cards. Bricks federate themselves in a completely decentralized manner to provide users with a set of logical volumes. This paper motivates FAB and introduces our data replication algorithm based on majority-voting. We argue that majority voting is practical for ultra-reliable, high-throughput storage systems like FAB, and present several techniques that improve both the performance and space overhead of our protocol.
How to be an efficient snoop, or the probe complexity of quorum systems
- SIAM Journal on Discrete Mathematics
, 1996
"... Abstract. A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication, and dissemination of information. When the elements may fail, a user of a di ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Abstract. A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication, and dissemination of information. When the elements may fail, a user of a distributed protocol needs to quickly find a quorum all of whose elements are alive or evidence that no such quorum exists. This is done by probing the system elements, one at a time, to determine if they are alive or dead. This paper studies the probe complexity PC(S) of a quorum system S, defined as the worst case number of probes required to find a live quorum or to show its nonexistence in S, using the best probing strategy. We show that for large classes of quorum systems, all n elements must be probed in the worst case. Suchsystems are called evasive. However, not all quorum systems are evasive; we demonstrate a system where O(log n) probes always suffice. Then we prove two lower bounds on the probe complexity in terms of the minimal quorum cardinality c(S) and the number of minimal quorums m(S). Finally, we show a universal probe strategy which never makes more than c(S) 2 − c(S) +1 probes; thus any system with c(S) ≤ √ n is nonevasive.
Support for Speculative Update Propagation and Mobility in Deno
"... This paper presents the replication framework of Deno, an object replication system specifically designed for mobile and weakly-connected environments. Deno uses weighted voting for availability and pair-wise, epidemic information flow for flexibility. This combination allows the protocols to operat ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper presents the replication framework of Deno, an object replication system specifically designed for mobile and weakly-connected environments. Deno uses weighted voting for availability and pair-wise, epidemic information flow for flexibility. This combination allows the protocols to operate with less than full connectivity, to easily adapt to changes in group membership, and to make few assumptions about the underlying network topology. Deno has been implemented and runs on top of Linux and Win32 platforms. We use the Deno prototype to characterize the performance of two versions of Deno's protocol. The first version enables globally serializable execution of update transactions. The second supports a weaker consistency level that still guarantees transactionally-consistent access to replicated data. We demonstrate that the incremental cost of providing global serializability is low, and that speculative dissemination of updates can significantly improve commit performance.

