Results 1 - 10
of
17
FAB: Building Distributed Enterprise Disk Arrays from Commodity Components
, 2004
"... This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances con ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-votingbased algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional masterslave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.
Efficient Byzantine-Tolerant Erasure-Coded Storage
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, JUNE 2004
, 2004
"... This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchrono ..."
Abstract
-
Cited by 73 (12 self)
- Add to MetaCart
This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nodes, the protocol shifts most work to clients and allows highly optimistic operation: reads occur in a single round-trip unless clients observe concurrency or write failures. Measurements of a storage system prototype using this protocol show that it scales well with the number of failures tolerated, and its performance compares favorably with an efficient implementation of Byzantine-tolerant state machine replication.
Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications
- In NCA-06: 5th IEEE International Symposium on Network Computing Applications
, 2006
"... NOTE: NCA’s page limit is rather severe: 8 pages. As a result, the final paper is pretty much a hatchet job of the original submission. I would recommend reading the technical report version of this paper, because it presents the material with some accompanying tutorial material, and is easier to re ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
NOTE: NCA’s page limit is rather severe: 8 pages. As a result, the final paper is pretty much a hatchet job of the original submission. I would recommend reading the technical report version of this paper, because it presents the material with some accompanying tutorial material, and is easier to read. The technical report is available at:
Reliability for networked storage nodes
- Research Report RJ-10358, IBM Almaden Research
, 2006
"... High-end enterprise storage has traditionally consisted of monolithic systems with customized hardware, multiple redundant components and paths, and no single point of failure. Distributed storage systems realized through networked storage nodes offer several advantages over monolithic systems such ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
High-end enterprise storage has traditionally consisted of monolithic systems with customized hardware, multiple redundant components and paths, and no single point of failure. Distributed storage systems realized through networked storage nodes offer several advantages over monolithic systems such as lower cost and increased scalability. In order to achieve reliability goals associated with enterprise-class storage systems, redundancy will have to be distributed across the collection of nodes to tolerate both node and drive failures. In this paper, we present alternatives for distributing this redundancy, and models to determine the reliability of such systems. We specify a reliability target and determine the configurations that meet this target. Further, we perform sensitivity analyses where selected parameters are varied to observe their effect on reliability. 1.
Proportional-share scheduling for distributed storage systems
- In ProACM Transactions on
, 2007
"... Fully distributed storage systems have gained popularity in the past few years because of their ability to use cheap commodity hardware and their high scalability. While there are a number of algorithms for providing differentiated quality of service to clients of a centralized storage system, the p ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Fully distributed storage systems have gained popularity in the past few years because of their ability to use cheap commodity hardware and their high scalability. While there are a number of algorithms for providing differentiated quality of service to clients of a centralized storage system, the problem has not been solved for distributed storage systems. Providing performance guarantees in distributed storage systems is more complex because clients may have different data layouts and access their data through different coordinators (access nodes), yet the performance guarantees required are global. This paper presents a distributed scheduling framework. It is an adaptation of fair queuing algorithms for distributed servers. Specifically, upon scheduling each request, it enforces an extra delay (possibly zero) that corresponds to the amount of service the client gets on other servers. Different performance goals, e.g., per storage node proportional sharing, total service proportional sharing or mixed, can be met by different delay functions. The delay functions can be calculated at coordinators locally so excess communication is avoided. The analysis and experimental results show that the framework can enforce performance goals under different data layouts and workloads. 1
Using Erasure Codes Efficiently for Storage in a Distributed System
- In Proc. of DSN’05
, 2005
"... Erasure codes provide space-optimal data redundancy to protect against data loss. A common use is to reliably store data in a distributed system, where erasure-coded data are kept in different nodes to tolerate node failures without losing data. In this paper, we propose a new approach to maintain e ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Erasure codes provide space-optimal data redundancy to protect against data loss. A common use is to reliably store data in a distributed system, where erasure-coded data are kept in different nodes to tolerate node failures without losing data. In this paper, we propose a new approach to maintain ensure-encoded data in a distributed system. The approach allows the use of space efficient -small. Concurrent updates and accesses to data are highly optimized: in common cases, they require no locks, no two-phase commits, and no logs of old versions of data. We evaluate our approach using an implementation and simulations for larger systems.
Olive: Distributed point-in-time branching storage for real systems
- In Proc. Third NSDI
, 2006
"... Abstract. This paper describes Olive, the first distributed block storage system to provide consistent pointin-time branching. Point-in-time branching allows users to recursively and quickly snapshot or clone the storage state. It has a wide range of applications including testing new deployments or ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. This paper describes Olive, the first distributed block storage system to provide consistent pointin-time branching. Point-in-time branching allows users to recursively and quickly snapshot or clone the storage state. It has a wide range of applications including testing new deployments or upgrades without disrupting a running system, quickly provisioning large homogeneous systems, and preserving old versions of data. Olive provides block-level access and strong consistency for broad applicability, allowing it to branch file systems, database systems, and every other storage application that ultimately stores data on block storage. Olive is distributed and replicated to provide fault tolerance and availability. Providing strong consistency for branching in a replicated distributed system is a technical challenge that we address in this work. We evaluate Olive and show that branching typically takes a few tens of milliseconds, and so it has little impact on I/O’s. 1
Agile store: Experience with quorum-based data replication techniques for adaptive Byzantine fault tolerance
- In IEEE Symposium on Reliable Distributed Systems
, 2005
"... Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is desirable that a system be able to adapt its operation so that fault tolerance related overheads are only incurred when the ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is desirable that a system be able to adapt its operation so that fault tolerance related overheads are only incurred when the protocol execution actually encounters faults. There are a number of issues that need to be carefully examined to achieve such agility of quorum based systems. We make use of a file system prototype, developed in our Agile Store project, to experimentally evaluate several techniques that are important for efficient implementation of Byzantine fault-tolerant quorum protocols. We present an optimistic quorum collection scheme and a probabilistic hashing scheme for determining the response to a quorum request, and show that they lead to significant performance improvements. The Agile Store also makes use of reconfigurable quorum techniques to allow system size and fault threshold to be dynamically varied when, for example, faulty servers are removed, new servers are added, or the threat level is changed. We quantify the performance gains made possible by such reconfiguration of quorum parameters. We also show how performance scales with different system parameters and how it is affected by design choices such as whether to use proxies. We believe that the results in the paper provide important insights into how to implement quorum protocols to provide good performance while achieving Byzantine fault tolerance. 1.
A protocol family approach to survivable storage infrastructures
- FuDiCo II: S.O.S. (Survivability: Obstacles and Solutions), 2nd Bertinoro Workshop on Future Directions in Distributed Computing
, 2004
"... A protocol family supports a variety of fault models with a single client-server protocol and a single server implementation. Protocol families shift the decision of which types of faults to tolerate from system design time to data creation time. With a protocol family based on a common survivable s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A protocol family supports a variety of fault models with a single client-server protocol and a single server implementation. Protocol families shift the decision of which types of faults to tolerate from system design time to data creation time. With a protocol family based on a common survivable storage infrastructure, each data-item can be protected from different types and numbers of faults. Thus, a single implementation can be deployed in different environments. Moreover, a single deployment can satisfy the specific survivability requirements of different data for costs commensurate with its requirements. 1
GRID codes: Strip-based erasure codes with high fault tolerance for storage systems
- ACM Transactions on Storage
, 2009
"... As storage systems grow in size and complexity, they are increasingly confronted with concurrent disk failures together with multiple unrecoverable sector errors. To ensure high data reliability and availability, erasure codes with high fault tolerance are required. In this article, we present a new ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
As storage systems grow in size and complexity, they are increasingly confronted with concurrent disk failures together with multiple unrecoverable sector errors. To ensure high data reliability and availability, erasure codes with high fault tolerance are required. In this article, we present a new family of erasure codes with high fault tolerance, named GRID codes. They are called such because they are a family of strip-based codes whose strips are arranged into multi-dimensional grids. In the construction of GRID codes, we first introduce a concept of matched codes and then discuss how to use matched codes to construct GRID codes. In addition, we propose an iterative reconstruction algorithm for GRID codes. We also discuss some important features of GRID codes. Finally, we compare GRID codes with several categories of existing codes. Our comparisons show that for large-scale storage systems, our GRID codes have attractive advantages over many existing erasure codes: (a) They are completely XOR-based and have very regular structures, ensuring easy implementation; (b) they can provide up to 15 and even higher fault tolerance; and (c) their storage efficiency can reach up to 80 % and even higher. All the advantages make GRID codes more suitable

