Results 11 - 20
of
91
Network Coding for Distributed Storage Systems
- In Proc. of IEEE INFOCOM
, 2007
"... Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a node failure is for a new node to download subsets of data stored at a number of surviving nodes, reconstruct a lost coded block using the downloaded data, and store it at the new node. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to download functions of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff. I.
Rosebud: A Scalable Byzantine-Fault-Tolerant Storage Architecture
, 2003
"... This paper presents Rosebud, a new Byzantine faulttolerant storage architecture designed to be highly scalable and deployable in the wide-area. To support massive amounts of data, we need to partition the data among the nodes. To support long-lived operation, we need to allow the set of nodes in the ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
This paper presents Rosebud, a new Byzantine faulttolerant storage architecture designed to be highly scalable and deployable in the wide-area. To support massive amounts of data, we need to partition the data among the nodes. To support long-lived operation, we need to allow the set of nodes in the system to change. To our knowledge, we are the first to present a complete design and a running implementation of Byzantine-fault-tolerant storage algorithms for a large scale, dynamic membership. We deployed Rosebud in a wide area testbed and ran experiments to evaluate its performance, and our experiments show that it performs well. We show that our storage algorithms perform equivalently to highly optimized replication algorithms in the wide-area. We also show that performance degradation is minor when the system reconfigures.
Proactive replication for data durability
- In Proceedings of the 5th Int’l Workshop on Peer-to-Peer Systems (IPTPS
, 2006
"... Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total byte ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total bytes sent since they only create replicas as needed; however, they can create spikes in network use after a failure. These spikes may overwhelm application traffic and can make it difficult to provision bandwidth. This paper explores a proactive approach that creates additional copies not in response to failures, but periodically at a fixed low rate. We introduce Tempo, a distributed hash table that allows each user to specify a maximum maintenance bandwidth and uses it to perform proactive replication. Results from a simulation study suggest that Tempo can deliver high durability despite only using several kilobytes per second of bandwidth, comparable to state-ofthe-art reactive systems. 1.
Freeloader: Scavenging desktop storage resources for scientific data
- IN PROCEEDINGS OF SUPERCOMPUTING
, 2005
"... High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage s ..."
Abstract
-
Cited by 23 (11 self)
- Add to MetaCart
High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We present the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled.
Delay aware querying with Seaweed
- In VLDB
, 2006
"... Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (10 3 to 10 9 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems, and their data, will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting online aggregation and completeness prediction. Seaweed is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and present an evaluation of the Seaweed prototype running on a large-scale network simulator driven by real-world traces. 1.
Prefix Hash Tree: An Indexing Data Structure over Distributed Hash Tables
, 2004
"... Distributed Hash Tables are scalable, robust, and self-organizing peer-to-peer systems that support exact match lookups. This paper describes the design and implementation of a Prefix Hash Tree - a distributed data structure that enables more sophisticated queries over a DHT. The Prefix Hash Tree us ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Distributed Hash Tables are scalable, robust, and self-organizing peer-to-peer systems that support exact match lookups. This paper describes the design and implementation of a Prefix Hash Tree - a distributed data structure that enables more sophisticated queries over a DHT. The Prefix Hash Tree uses the lookup interface of a DHT to construct a trie-based structure that is both e#cient (updates are doubly logarithmic in the size of the domain being indexed), and resilient (the failure of any given node in the Prefix Hash Tree does not a#ect the availability of data stored at other nodes).
Erasure Code Replication Revisited
- In PTP04: 4th International Conference on Peer-to-Peer Computing. IEEE
, 2004
"... Erasure coding is a technique for achieving high availability and reliability in storage and communication systems. In this paper, we revisit the analysis of erasure code replication and point out some situations when whole-file replication is preferred. The switchover point (from preferring whole-f ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Erasure coding is a technique for achieving high availability and reliability in storage and communication systems. In this paper, we revisit the analysis of erasure code replication and point out some situations when whole-file replication is preferred. The switchover point (from preferring whole-file replication to erasure code replication) is studied, and characterized using asymptotic analysis. We also discuss the additional considerations in building erasure code replication systems. 1
On object maintenance in peer-to-peer systems
- In Proc. of the 5th International Workshop on Peer-to-Peer Systems
, 2006
"... Storage is often a fundamental service provided by peer-topeer systems, where the system stores data objects on behalf of higher-level services, applications, and users. A primary challenge in peer-to-peer storage systems is to efficiently ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Storage is often a fundamental service provided by peer-topeer systems, where the system stores data objects on behalf of higher-level services, applications, and users. A primary challenge in peer-to-peer storage systems is to efficiently
A Distributed Hash Table
, 2005
"... DHash is a new system that harnesses the storage and network resources of computers distributed across the Internet by providing a wide-area storage service, DHash. DHash frees applications from re-implementing mechanisms common to any system that stores data on a collection of machines: it maintain ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
DHash is a new system that harnesses the storage and network resources of computers distributed across the Internet by providing a wide-area storage service, DHash. DHash frees applications from re-implementing mechanisms common to any system that stores data on a collection of machines: it maintains a mapping of objects to servers, replicates data for durability, and balances load across participating servers. Applications access data stored in DHash through a familiar hash-table interface: put stores data in the system under a key; get retrieves the data. DHash has proven useful to a number of application builders and has been used to build a content-distribution system [34], a Usenet replacement [118], and new Internet naming architectures [133, 132]. These applications demand low-latency, high-throughput access
Practical load balancing for content requests in peer-to-peer networks
"... This paper studies the problem of balancing the demand for content in a peer-to-peer network across heterogeneous peer nodes that hold replicas of the content. Previous decentralized load balancing techniques in distributed systems base their decisions on periodic updates containing information ab ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper studies the problem of balancing the demand for content in a peer-to-peer network across heterogeneous peer nodes that hold replicas of the content. Previous decentralized load balancing techniques in distributed systems base their decisions on periodic updates containing information about load or available capacity observed at the serving entities. We show that these techniques do not work well in the peer-to-peer context; either they do not address peer node heterogeneity, or they suffer from significant load oscillations which result in unutilized capacity. We propose a new decentralized algorithm, Max-Cap, based on the maximum inherent capacities of the replica nodes. We show that unlike previous algorithms, it is not tied to the timeliness or frequency of updates, and consequently requires significantly less update overhead. Yet, Max-Cap can handle the heterogeneity of a peer-to-peer environment without suffering from load oscillations.

