• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Distributed Hash Table (2005)

by Frank Dabek
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Efficient replica maintenance for distributed storage systems

by Byung-gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowicz, Robert Morris - In Proc. of NSDI , 2006
"... This paper considers replication strategies for storage systems that aggregate the disks of many nodes spread over the Internet. Maintaining replication in such systems can be prohibitively expensive, since every transient network or host failure could potentially lead to copying a server’s worth of ..."
Abstract - Cited by 79 (17 self) - Add to MetaCart
This paper considers replication strategies for storage systems that aggregate the disks of many nodes spread over the Internet. Maintaining replication in such systems can be prohibitively expensive, since every transient network or host failure could potentially lead to copying a server’s worth of data over the Internet to maintain replication levels. The following insights in designing an efficient replication algorithm emerge from the paper’s analysis. First, durability can be provided separately from availability; the former is less expensive to ensure and a more useful goal for many wide-area applications. Second, the focus of a durability algorithm must be to create new copies of data objects faster than permanent disk failures destroy the objects; careful choice of policies for what nodes should hold what data can decrease repair time. Third, increasing the number of replicas of each data object does not help a system tolerate a higher disk failure probability, but does help tolerate bursts of failures. Finally, ensuring that the system makes use of replicas that recover after temporary failure is critical to efficiency. Based on these insights, the paper proposes the Carbonite replication algorithm for keeping data durable at a low cost. A simulation of Carbonite storing 1 TB of data over a 365 day trace of PlanetLab activity shows that Carbonite is able to keep all data durable and uses 44 % more network traffic than a hypothetical system that only responds to permanent failures. In comparison, Total Recall and DHash require almost a factor of two more network traffic than this hypothetical system. 1

Proactive replication for data durability

by Emil Sit, Andreas Haeberlen, Frank Dabek, Byung-gon Chun, Hakim Weatherspoon, Robert Morris, M. Frans Kaashoek, John Kubiatowicz - In Proceedings of the 5th Int’l Workshop on Peer-to-Peer Systems (IPTPS , 2006
"... Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total byte ..."
Abstract - Cited by 28 (6 self) - Add to MetaCart
Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total bytes sent since they only create replicas as needed; however, they can create spikes in network use after a failure. These spikes may overwhelm application traffic and can make it difficult to provision bandwidth. This paper explores a proactive approach that creates additional copies not in response to failures, but periodically at a fixed low rate. We introduce Tempo, a distributed hash table that allows each user to specify a maximum maintenance bandwidth and uses it to perform proactive replication. Results from a simulation study suggest that Tempo can deliver high durability despite only using several kilobytes per second of bandwidth, comparable to state-ofthe-art reactive systems. 1.

Design and Evaluation of Distributed Wide-Area On-line Archival Storage Systems

by Hakim Weatherspoon , 2006
"... ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract not found

Predicting durability in DHTs using Markov chains

by Fabio Picconi, Bruno Baynat
"... We consider the problem of data durability in lowbandwidth large-scale distributed storage systems. Given the limited bandwidth between replicas, these systems suffer from long repair times after a hard disk crash, making them vulnerable to data loss when several replicas fail within a short period ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We consider the problem of data durability in lowbandwidth large-scale distributed storage systems. Given the limited bandwidth between replicas, these systems suffer from long repair times after a hard disk crash, making them vulnerable to data loss when several replicas fail within a short period of time. Recent work has suggested that the probability of data loss can be predicted by modeling the number of live replicas using a Markov chain. This, in turn, can then be used to determine the number of replicas necessary to keep the loss probability under a given desired value. Previous authors have suggested that the model parameters can be estimated using an expression that is constant or linear on the number of replicas. Our simulations, however, show that neither is correct, as these parameter values grow sublinearly with the number of replicas. Moreover, we show that using a linear expression will result in the probability of data loss being underestimated, while the constant expression will produce a significant overestimation. Finally, we provide an empirical expression that yields a good approximation of the sublinear parameter values. Our work can be viewed as a first step towards finding more accurate models to predict the durability of this type of systems. 1

Routing Tradeoffs in Dynamic Peer-to-peer Networks

by Jinyang Li, Robert Morris , 2005
"... Distributed Hash Tables (DHTs) are useful tools for building large scale distributed systems. DHTs provide a hash-table-like interface to applications by routing a key to its responsible node among the current set of participating nodes. DHT deployments are characterized by churn, a continuous proce ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Distributed Hash Tables (DHTs) are useful tools for building large scale distributed systems. DHTs provide a hash-table-like interface to applications by routing a key to its responsible node among the current set of participating nodes. DHT deployments are characterized by churn, a continuous process of nodes joining and leaving the network. Lookup latency is important to applications that use DHTs to locate data. In order to achieve low latency lookups, each node needs to consume bandwidth to keep its routing tables up to date under churn. A robust DHT should use bandwidth sparingly and avoid overloading the network when the the deployment scenario deviates from design assumptions. Ultimately, DHT designers are interested in obtaining best latency lookups using a bounded amount of bandwidth across a wide range of operating environments. This thesis presents a new DHT protocol, Accordion, that achieves this goal. Accordion bounds its overhead traffic according to a user specified bandwidth

Proactive Replication for Data Durability

by Emil Sit Andreas, Andreas Haeberlen, Frank Dabek, Byung-gon Chun, Hakim Weatherspoon, Robert Morris, M. Frans Kaashoek, John Kubiatowicz - In Proceedings of the 5th Int’l Workshop on Peer-to-Peer Systems (IPTPS , 2006
"... Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total byte ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total bytes sent since they only create replicas as needed; however, they can create spikes in network use after a failure. These spikes may overwhelm application traffic and can make it difficult to provision bandwidth.

F2F: reliable storage in open networks

by Jinyang Li Frank , 2006
"... A major hurdle to deploying a distributed storage infrastructure in peer-to-peer systems is storing data reliably using nodes that have little incentive to remain in the system. We argue that a node should choose its neighbors (the nodes with which it shares resources) based on existing social relat ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A major hurdle to deploying a distributed storage infrastructure in peer-to-peer systems is storing data reliably using nodes that have little incentive to remain in the system. We argue that a node should choose its neighbors (the nodes with which it shares resources) based on existing social relationships instead of randomly. This approach provides incentives for nodes to cooperate and results in a more stable system which, in turn, reduces the cost of maintaining data. The cost of this approach is decreased flexibility and storage utilization. We describe our approach and sketch two applications for which this approach is viable: a cooperative backup system and a Usenet replacement.

An analytical estimation of durability in DHTs

by Fabio Picconi, Bruno Baynat, Pierre Sens
"... Abstract. Recent work has shown that the durability of large-scale storage systems such as DHTs can be predicted using a Markov chain model. However, accurate predictions are only possible if the model parameters are also estimated accurately. We show that the Markov chain rates proposed by other au ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Recent work has shown that the durability of large-scale storage systems such as DHTs can be predicted using a Markov chain model. However, accurate predictions are only possible if the model parameters are also estimated accurately. We show that the Markov chain rates proposed by other authors do not consider several aspects of the system’s behavior, and produce unrealistic predictions. We present a new analytical expression for the chain rates that is condiderably more fine-grain that previous estimations. Our experiments show that the loss rate predicted by our model is much more accurate than previous estimations. 1

Storing and Managing Data in a Distributed Hash Table

by Emil Sit , 2008
"... Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet artic ..."
Abstract - Add to MetaCart
Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet articles: the DHT allows a set of N hosts to share storage of Usenet articles, reducing their combined storage requirements by a factor of O(N). Usenet generates a continuous stream of writes that exceeds 1 Tbyte/day in volume, comprising over ten million writes. Supporting this and the associated read workload requires a DHT engineered for durability and efficiency. Recovering from network and machine failures efficiently poses a challenge for DHT replication maintenance algorithms that provide durability. To avoid losing the last replica, replica maintenance must create additional replicas when failures are detected. However,

Author manuscript, published in "ACM SIGOPS Operating Systems Review (2007)" GOSSIP: Gossip Over Storage Systems Is Practical

by Hakim Weatherspoon, Cornell Univ, Hugo Miranda, Lisboa Univ, Konrad Iwanicki, Vrije Univ, Ali Ghodsi, Yann Busnel , 2008
"... Gossip-based mechanisms are touted for their simplicity, limited resource usage, robustness to failures, and tunable system behavior. These qualities make gossiping an ideal mechanism for storage systems that are responsible for maintaining and updating data in a mist of failures and limited resourc ..."
Abstract - Add to MetaCart
Gossip-based mechanisms are touted for their simplicity, limited resource usage, robustness to failures, and tunable system behavior. These qualities make gossiping an ideal mechanism for storage systems that are responsible for maintaining and updating data in a mist of failures and limited resources (e.g., intermittent network connectivity, limited bandwidth, constrained communication range, or limited battery power). We focus on persistent storage systems that, unlike mere caches, are responsible for the durability and consistency of data. Examples of such systems may be encountered in many different environments, in particular: wide-area networks (limited bandwidth), wireless sensor networks (limited resources), and mobile ad hoc networks (intermittent connectivity). In this paper, we demonstrate the qualities of gossiping in these three respective environments. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University