Results 1 - 10
of
99
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing
- Journal of Future Generation Computing Systems
, 1999
"... ..."
Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems
, 1993
"... ..."
Reaching Agreement on Processor Group Membership in Synchronous Distributed Systems
- Distributed Computing
, 1991
"... Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subj ..."
Abstract
-
Cited by 125 (14 self)
- Add to MetaCart
Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays. Key words: Communication network -- Distributed system -- Failure detection -- Fault tolerance -- Real time system -- Replicated data 1 Introduction When designing a computing service that must remain available despite component failures, a key idea is to replicate service state information at several servers running on distinct processors. The service state typically consists of the ser...
Voting with witnesses: A consistency scheme for replicated files
- In Proceedings of the 6th International Conference on Distributed Computing Systems
, 1986
"... Voting schemes ensure the consistency of replicated files by disallowing all read and write requests that cannot collect an appropriate quorum of copies. This procedure requires a minimum number of three copies to be of any practical use and tends to disallow a relatively high number of read and wri ..."
Abstract
-
Cited by 86 (10 self)
- Add to MetaCart
Voting schemes ensure the consistency of replicated files by disallowing all read and write requests that cannot collect an appropriate quorum of copies. This procedure requires a minimum number of three copies to be of any practical use and tends to disallow a relatively high number of read and write requests. We propose to replace some of these copies by mere records of the current state of the file. These records, called witnesses, will be assigned weights and participate to the collection of quorums. We show, that under very general assumptions, the reliability of a replicated file consisting of n copies and m witnesses is the same as the reliability of a replicated file consisting of n + m copies. We also compare the availability of a replicated file consisting of two copies and one witness with that of a file having three copies and show that, under normal circumstances, the two files have similar availabilities.
Fail-Awareness in Timed Asynchronous Systems
, 2003
"... We address the problem of the impossibility of implementing synchronous fault-tolerant service specifications in asynchronous distributed systems. We introduce a method for weakening a synchronous service specification so that it becomes implementable in "timed" asynchronous systems, that is, asynch ..."
Abstract
-
Cited by 43 (15 self)
- Add to MetaCart
We address the problem of the impossibility of implementing synchronous fault-tolerant service specifications in asynchronous distributed systems. We introduce a method for weakening a synchronous service specification so that it becomes implementable in "timed" asynchronous systems, that is, asynchronous systems in which processes have access to local hardware clocks. The method (1) adds to a service interface an exception indicator so that a client knows at any time if a server is currently providing its standard "synchronous" semantics or some other specified exceptional semantics, (2) the standard behavior provided when the exception indicator does not signal an exception is "similar" to the original synchronous service behavior, and (3) a server has to provide its standard semantics whenever the underlying communication and process services exhibit "synchronous behavior ". To illustrate our method, we show how the specification of a synchronous datagram service and an internal clock synchronization service can be transformed into a fail-aware service specification. Further illustrations of the usefulness of fail-aware services are provided by describing a railway crossing service and a fail-aware weak group membership service.
Service Interface and Replica Management Algorithm for Mobile File System Clients
- In Proceedings of the First International Conference on Parallel and Distributed Information Systems
, 1991
"... Portable computers are now common, a fact that raises the possibility that file service clients might move on a regular basis. This new development requires rethinking some features of distributed file system design. We argue that existing approaches to file replica management would not cope well wi ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
Portable computers are now common, a fact that raises the possibility that file service clients might move on a regular basis. This new development requires rethinking some features of distributed file system design. We argue that existing approaches to file replica management would not cope well with the likely behavior of mobile clients, and we present our solution: a lazy "server-based" update operation. This operation facilitates fast, scalable, and highly fault-tolerant implementations of both read and write operations in the usual case. To cope with the weak semantics of the update operation, we propose a new file system service interface that allows applications to opt for "UNIX semantics" by use of a slower, less fault-tolerant read operation. 1 Introduction This work investigates how to maintain replicas in a distributed file system, especially one supporting mobile clients. While the topic of replica management within file systems has received so much attention that one mig...
Increasing the Resilience of Atomic Commit, at No Additional Cost
- In Symposium on Principles of Database Systems
, 1995
"... This paper presents a new atomic commitment protocol, Enhanced Three Phase Commit (E3PC ), that always allows a quorum in the system to make progress. Previously suggested quorum-based protocols (e.g. the quorum-based Three Phase Commit (3PC) [Ske82]) allow a quorum to make progress in case of one ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
This paper presents a new atomic commitment protocol, Enhanced Three Phase Commit (E3PC ), that always allows a quorum in the system to make progress. Previously suggested quorum-based protocols (e.g. the quorum-based Three Phase Commit (3PC) [Ske82]) allow a quorum to make progress in case of one failure. If failures cascade, however, and the quorum in the system is "lost" (i.e. at a given time no quorum component exists, e.g. because of a total crash), a quorum can later become connected and still remain blocked. With our protocol, a connected quorum never blocks. E3PC is based on the quorumbased 3PC [Ske82], and it does not require more time or communication than 3PC. The principles demonstrated in this paper can be used to increase the resilience of a variety of distributed services, e.g. replicated database systems, by ensuring that a quorum will always be able to make progress. 1 Introduction Reliability and availability of loosely coupled distributed database systems is beco...
A Dataflow Approach to Event-based Debugging
- Software - Practice and Experience
, 1991
"... This paper describes a novel approach to event-based debugging. The approach is based on a (coarsegrained) dataflow view of events: a high-level event is recognized when an appropriate combination of lower-level events on which it depends has occurred. Event recognition is controlled using familiar ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
This paper describes a novel approach to event-based debugging. The approach is based on a (coarsegrained) dataflow view of events: a high-level event is recognized when an appropriate combination of lower-level events on which it depends has occurred. Event recognition is controlled using familiar programming language constructs. This approach is more flexible and powerful than current ones. It allows arbitrary debugger language commands to be executed when attempting to form higher-level events. It also allows users to specify event recognition in much the same way that they write programs. This paper also describes a prototype, Dalek, that employs the dataflow approach for debugging sequential programs. Dalek demonstrates the feasibility and attractiveness of the dataflow approach. One important motivation for this work is that current sequential debugging tools are inadequate. Dalek contributes toward remedying such inadequacies by providing events and a powerful debugging language
Non-Blocking Atomic Commitment
- In Sape Mullender, editor, Distributed Systems
, 1993
"... via anonymous FTP from the areaftp.cs.unibo.it:/pub/TR/UBLCS in compressed PostScript format. Abstracts are available from the same host in the directory /pub/TR/ABSTRACTS in plain text format. All local authors can be reached via e-mail at the address last-name@cs.unibo.it. ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
via anonymous FTP from the areaftp.cs.unibo.it:/pub/TR/UBLCS in compressed PostScript format. Abstracts are available from the same host in the directory /pub/TR/ABSTRACTS in plain text format. All local authors can be reached via e-mail at the address last-name@cs.unibo.it.
An Algorithm for Data Replication
- DIGITAL SYSTEMS RESEARCH CENTER TECH. REP
, 1989
"... Replication is an important technique for increasing computer system availability. In this paper, we present an algorithm for replicating stored data on multiple server machines. The algorithm organizes the replicated servers in a master/slaves scheme, with one master election being performed at the ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Replication is an important technique for increasing computer system availability. In this paper, we present an algorithm for replicating stored data on multiple server machines. The algorithm organizes the replicated servers in a master/slaves scheme, with one master election being performed at the beginning of each service period. The status of each replica is summarized by a set of monotonically increasing epoch variables. Examining the epoch variables of a majority of the replicas reveals which replicas have up-to-date data. The set of replicas can be changed dynamically. Replicas that have been off-line can be brought up to date in background, and witness replicas, which store the epoch variables but not the data, can participate in the majority voting. The algorithm does not require distributed atomic transactions. The algorithm also permits client machines to cache copies of data, with strict cache consistency being ensured by having the replicated servers keep track of which cl...

