Results 1 -
7 of
7
Proactive Replication in Distributed Storage Systems Using Machine Availability Estimation ABSTRACT
"... Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be introduced. Since node failures can be either transient or permanent, deciding when to generate new fragments is non-triv ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be introduced. Since node failures can be either transient or permanent, deciding when to generate new fragments is non-trivial. An additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time. To be able to adapt to changes in the failure behavior, many systems adopt a reactive approach, in which new fragments are created as soon as a failure is detected. However, reactive approaches tend to produce spikes in bandwidth consumption. Proactive approaches create new fragments at a fixed rate that depends on the knowledge of the failure behavior or is given by the system administrator. However, existing proactive systems are not able to adapt to a changing failure behavior, which is common in real world. We propose a new technique based on an ongoing estimation of the failure behavior that is obtained using a model that consists of a network of queues. This scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems, generalizing the two previous approaches. Now, the duality reactive or proactive becomes a specific case of a wider approach tunable with respect to the dynamics of the failure behavior. 1.
Analysis of failure correlation impact on peer-to-peer storage systems
- In IEEE Int. Conf. on Peer-to-Peer Comp. (P2P ’09
, 2009
"... Abstract—Peer-to-peer storage systems aim to provide a reliable long-term storage at low cost. In such systems, peers fail continuously, hence, the necessity of self-repairing mechanisms to achieve high durability. In this paper, we propose and study analytical models that assess the bandwidth consu ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—Peer-to-peer storage systems aim to provide a reliable long-term storage at low cost. In such systems, peers fail continuously, hence, the necessity of self-repairing mechanisms to achieve high durability. In this paper, we propose and study analytical models that assess the bandwidth consumption and the probability to lose data of storage systems that use erasure coded redundancy. We show by simulations that the classical stochastic approach found in the literature, that models each block independently, gives a correct approximation of the system average behavior, but fails to capture its variations over time. These variations are caused by the simultaneous loss of multiple data blocks that results from a peer failing (or leaving the system). We then propose a new stochastic model based on a fluid approximation that better captures the system behavior. In addition to its expectation, it gives a correct estimation of its standard deviation. This new model is validated by simulations. I.
Availability and Redundancy in Harmony: Measuring Retrieval Times in P2P Storage Systems
"... Abstract—Peer-to-peer (P2P) storage systems are strongly affected by churn —temporal and permanent peer failures. Because of this churn, the main requirement of such systems is to guarantee that stored objects can always be retrieved. This requirement is specially needed in two main situations: when ..."
Abstract
- Add to MetaCart
Abstract—Peer-to-peer (P2P) storage systems are strongly affected by churn —temporal and permanent peer failures. Because of this churn, the main requirement of such systems is to guarantee that stored objects can always be retrieved. This requirement is specially needed in two main situations: when users want to access the stored objects or when data maintenance processes have to repair lost information. To meet this requirement, exiting P2P storage systems introduce large amounts of redundancy that maintain data availability close to 100%. Unfortunately, these large amounts of redundancy increase the storage costs, either by reducing the overall net capacity or by increasing the communication required for data maintenance. In order to minimize storage costs, P2P storage systems can reduce data redundancy. However, less redundancy means lower data availability, which leads to increase object retrieval times. Unfortunately, longer retrieval times could compromise data maintenance processes and could penalize user’s retrieval times. It is crucial then for P2P storage systems to predict the effects of a redundancy reduction. In order to provide this information, we present a novel analytical framework to measure object retrieval times under different redundancy and churn circumstances. Our framework can be directly used by backup applications aiming to maintain durability at the lower cost, or by data sharing applications that seek to reduce costs by penalizing user retrieval times. We validate our framework by simulation using real P2P traces (Skype and eMule’s KAD). I.
Protector: A Probabilistic Failure Detector for Cost-Effective Peer-to-Peer Storage
"... Abstract—Maintaining a given level of data redundancy is a fundamental requirement of peer-to-peer (P2P) storage systems—to ensure desired data availability, additional replicas must be created when peers fail. Since the majority of failures in P2P networks are transient (i.e., peers return with dat ..."
Abstract
- Add to MetaCart
Abstract—Maintaining a given level of data redundancy is a fundamental requirement of peer-to-peer (P2P) storage systems—to ensure desired data availability, additional replicas must be created when peers fail. Since the majority of failures in P2P networks are transient (i.e., peers return with data intact), an intelligent system can reduce significant replication costs by not replicating data following transient failures. Reliably distinguishing permanent and transient failures, however, is a challenging task, because peers are unresponsive to probes in both cases. In this paper, we propose Protector, an algorithm that enables efficient replication policies by estimating the number of “remaining replicas ” for each object, including those temporarily unavailable due to transient failures. Protector dramatically improves detection accuracy by exploiting two opportunities. First, it leverages failure patterns to predict the likelihood that a peer (and the data it hosts) has permanently failed given its current downtime. Second, it detects replication level across groups of replicas (or fragments), thereby balancing false positives for some peers against false negatives for others. Extensive simulations based on both synthetic and real traces show that Protector closely approximates the performance of a perfect “oracle” failure detector, and significantly outperforms time-out-based detectors using a wide range of parameters. Finally, we design, implement and deploy an efficient P2P storage system called AmazingStore by combining Protector with structured P2P overlays. Our experience proves that Protector enables efficient long-term data maintenance in P2P storage systems. Index Terms—Failure detector, P2P storage, availability, replication management. Ç
A Characterization of Node Uptime Distributions in the PlanetLab Test Bed
"... Abstract — In this paper, we study nodes from the PlanetLab test bed to form a model of their uptime behavior. By applying clustering techniques to over a year’s worth of availability data for the nodes, we identify six uptime distributions, each exhibiting unique characteristics shared by the nodes ..."
Abstract
- Add to MetaCart
Abstract — In this paper, we study nodes from the PlanetLab test bed to form a model of their uptime behavior. By applying clustering techniques to over a year’s worth of availability data for the nodes, we identify six uptime distributions, each exhibiting unique characteristics shared by the nodes within it. The behavioral patterns exhibited by these groups, combined with the behaviors exhibited by the aggregate across the system, provide useful information for researchers designing applications that are run or tested on PlanetLab. Keywords-distributed system; classification; availability; modeling I.

