Results 1 - 10
of
10
A longitudinal survey of Internet host reliability
, 1995
"... Introduction Accurate analyses of fault-tolerance and replication mechanisms depend on an accurate model of the reliability of the systems that make them up. The overall reliability of a replication protocol, for example, depends on the probability that some fraction of the replica sites are functi ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Introduction Accurate analyses of fault-tolerance and replication mechanisms depend on an accurate model of the reliability of the systems that make them up. The overall reliability of a replication protocol, for example, depends on the probability that some fraction of the replica sites are functioning when data must be read or written. There are several important measures used to quantify system reliability, including time-to-failure (TTF), time-to- repair (TTR), availability, and reliability. Throughout this study, "failure" is defined in a distributed-environment sense; that is, as an inability to access a host. The term encompasses both hardware and software faults attributable to the host, and can include power failures and scheduled downtime. It can also be caused by offsite communications failures, ranging from temporary routing failures to problems with the physical commun
A Study of the Reliability of Internet Sites
, 1991
"... Modeling &e reliability of distributed systems requires a good understanding of the reliability of the components'. Careful modeling allows highly fault-tolerant distributed data applications to be constructed at the least cost. Failure and repair ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Modeling &e reliability of distributed systems requires a good understanding of the reliability of the components'. Careful modeling allows highly fault-tolerant distributed data applications to be constructed at the least cost. Failure and repair
Voting with Regenerable Volatile Witnesses
, 1991
"... Voting protocols ensure the consistency of replicated objects by requiring all read and write requests to collect an appropriate quorum of replicas. We propose to replace some of these replicas by volatile witnesses that have no data and require no stable storage, and to regenerate them instead of w ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Voting protocols ensure the consistency of replicated objects by requiring all read and write requests to collect an appropriate quorum of replicas. We propose to replace some of these replicas by volatile witnesses that have no data and require no stable storage, and to regenerate them instead of waiting for recovery. The small size of volatile witnesses allows them to be regenerated much easier than full replicas. Regeneration attempts are also much more likely to succeed since volatile witnesses can be stored on diskless sites. We show that under standard Markovian assumptions two full replicas and one regenerable volatile witness managed by a two-tier dynamic voting protocol provide a higher data availability than three full replicas managed by majority consensus voting or optimistic dynamic voting provided site failures can be detected significantly faster than they can be repaired. Keywords: distributed file systems, replicated data, voting, witnesses. 1. INTRODUCTION Fault-tol...
Regeneration with Virtual Copies for Replicated Databases
- In Proceedings of the 11th IEEE International Conference on Distributed Computing Systems
, 1991
"... We consider the consistency control problem for replicated data in a distributed computing system (DCS) and propose a new algorithm to dynamically regenerate copies of data objects in response to node failures and network partitioning in the system. The DCS is assumed to have strict consistency cons ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We consider the consistency control problem for replicated data in a distributed computing system (DCS) and propose a new algorithm to dynamically regenerate copies of data objects in response to node failures and network partitioning in the system. The DCS is assumed to have strict consistency constraints for data object copies. The new algorithm combines the advantages of voting based algorithms and regeneration mechanisms to maintain mutual consistency of replicated data objects in the case of node failures and network partitioning. Our algorithm extends the feasibility of regeneration to DCS on wide area networks, and is able to satisfy user queries as long as there is one current partition in the system. 1 Introduction In a distributed computing environment, two types of failures may occur: the processor at a given site may fail (referred to as site failure), and communication between two sites may fail (referred to as communication link failure). When a site fails, processing at...
Regeneration with Virtual Copies for Distributed Computing Systems
- IEEE Trans. Softw. Eng
, 1994
"... We consider the consistency control problem for replicated data in a distributed computing system (DCS) and propose a new algorithm to dynamically regenerate copies of data objects in response to node failures and network partitioning in the system. The DCS is assumed to have strict consistency cons ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We consider the consistency control problem for replicated data in a distributed computing system (DCS) and propose a new algorithm to dynamically regenerate copies of data objects in response to node failures and network partitioning in the system. The DCS is assumed to have strict consistency constraints for data object copies. The new algorithm combines the advantages of voting based algorithms and regeneration mechanisms to maintain mutual consistency of replicated data objects in the case of node failures and network partitioning. Our algorithm extends the feasibility of regeneration to DCS on wide area networks, and is able to satisfy user queries as long as there is one current partition in the system. A stochastic availability analysis of our algorithm shows that it provides improved availability as compared to previously proposed dynamic voting algorithms. 1 Introduction In a distributed computing environment, two types of failures may occur: the processor at a given site may...
LDFS: A Fault-Tolerant Local Disk-Based File System for Mobile Agents
"... A local disk-based file system, LDFS, is an attractive way to speed up distributed applications. Local file access is much faster than accessing data on remote file servers through the network. LDFS is also scalable, as it does not rely on centralized file servers, and it exploits already existin ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A local disk-based file system, LDFS, is an attractive way to speed up distributed applications. Local file access is much faster than accessing data on remote file servers through the network. LDFS is also scalable, as it does not rely on centralized file servers, and it exploits already existing resources (local disks) to provide storage. However, since individual workstations are less reliable and less available than file servers, LDFS must be made fault tolerant. We present an approach that integrates the LDFS with the distributed application. This is particularly suitable for mobile agent systems, because they can easily migrate to access remote files. LDFS avoids logging of individual file accesses, which are regenerated automatically from application messages. Our experiments show that the overhead of checkpointing with LDFS is generally smaller that with NFS, while access time to files decreases dramatically. Keywords: fault tolerant computing, stable storage, mobile agent distributed systems 1.
A study of the reliability of hosts on the Internet
, 1993
"... vi Acknowledgments vii 1 Introduction 1 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Previous Studies and Related Work : : : : : : : : : : : : : : : : : : : : : : : 3 1.2.1 Organization of thesis : : : : : : : : : : : : : : : : : : : : : : : : : : 5 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
vi Acknowledgments vii 1 Introduction 1 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Previous Studies and Related Work : : : : : : : : : : : : : : : : : : : : : : : 3 1.2.1 Organization of thesis : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2 Theory 6 2.1 Renewal Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2 Sampling the inter-event renewal time : : : : : : : : : : : : : : : : : : : : : 8 2.3 Sampling the backward occurrence : : : : : : : : : : : : : : : : : : : : : : : 9 2.3.1 The Exponential model : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.3.2 The Weibull model : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.3.3 The Gamma distribution model : : : : : : : : : : : : : : : : : : : : : 15 2.3.4 Histogram estimate : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 2.3.5 Goodness of fit : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.4 Sampling the...
Estimating the Reliability of Hosts Using the Internet
, 1991
"... Modeling the reliability distributed systems, whether through analysis or simulation, requires a good understanding of the reliability of the components. Careful modeling allows highly faulttolerant distributed data bases and similar applications to be constructed at the least cost. It is often assu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Modeling the reliability distributed systems, whether through analysis or simulation, requires a good understanding of the reliability of the components. Careful modeling allows highly faulttolerant distributed data bases and similar applications to be constructed at the least cost. It is often assumed that the failure and repair rates of components are exponentially distributed. This hypothesis is testable for failure rates, though the process of gathering and reducing the data to a usable form can be difficult. By applying an appropriate test statistic, some of the samples were found to have a realistic chance of being drawn from an exponential distribution, while others can be confidently classed as non-exponential. For this study, data were collected from a large number of hosts via the Internet with no special privileges or monitoring facilities. Over 350; 000 hosts were considered, and more than 68; 000 of these that were judged likely to respond were queried. These hosts were sa...
Analysis of a Dynamic Voting Algorithm Based on Regeneration with Virtual Copies
"... A consistency control algorithm for replicated data objects in distributed computing systems, called RVC2, has been extensively analyzed. RVC2 is a voting-based algorithm which utilizes a selective regeneration and recovery mechanism for failed copies of data objects. Virtual copies, which record in ..."
Abstract
- Add to MetaCart
A consistency control algorithm for replicated data objects in distributed computing systems, called RVC2, has been extensively analyzed. RVC2 is a voting-based algorithm which utilizes a selective regeneration and recovery mechanism for failed copies of data objects. Virtual copies, which record information about the current state of a copy, but which contain no actual data, are used in addition to real copies to reduce network and storage overhead. A theoretical analysis of the algorithm under normal and exceptional conditions, in terms of the message cost, is presented. The results show that RVC2 has a high message cost under some conditions. Empirical results concerning availability, obtained through simulation, are also discussed. These results show that varying the number of real versus virtual copies, and varying the generation threshold, has no significant impact on availability. The results also suggest that RVC2 is an unnecessarily complex algorithm because regeneration has no significant impact on availability under most circumstances. 1.

