Results 1 - 10
of
36
The process group approach to reliable distributed computing
- Communications of the ACM
, 1993
"... The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, ..."
Abstract
-
Cited by 501 (16 self)
- Add to MetaCart
The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation, and achieve high reliability. This paper reviews six years of resemr,.hon Isis, describing the model, its impl_nentation challenges, and the types of applicatiom to which Isis has been appfied. 1 In oducfion One might expect the reliability of a distributed system to follow directly from the reliability of its con-stituents, but this is not always the case. The mechanisms used to structure a distributed system and to implement cooperation between components play a vital role in determining how reliable the system will be. Many contemporary distributed operating systems have placed emphasis on communication performance, overlooking the need for tools to integrate components into a reliable whole. The communication primitives supported give generally reliable behavior, but exhibit problematic semantics when transient failures or system configuration changes occur. The resulting building blocks are, therefore, unsuitable for facilitating the construction of systems where reliability is impo/tant. This paper reviews six years of research on Isis, a syg_,,m that provides tools _ support the construction of reliable distributed software. The thesis underlying l._lS is that development of reliable distributed software can be simplified using process groups and group programming too/_. This paper motivates the approach taken, surveys the system, and discusses our experience with real applications.
Weak-Consistency Group Communication and Membership
, 1992
"... Many distributed systems for widearea networks can be built conveniently, and operate efficiently and correctly, using a weak consistency group communication mechanism. This mechanism organizes a set of principals into a single logical entity, and provides methods to multicast messages to the membe ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
Many distributed systems for widearea networks can be built conveniently, and operate efficiently and correctly, using a weak consistency group communication mechanism. This mechanism organizes a set of principals into a single logical entity, and provides methods to multicast messages to the members. A weak consistency distributed system allows the principals in the group to differ on the value of shared state at any given instant, as long as they will eventually converge to a single, consistent value. A group containing many principals and using weak consistency can provide the reliability, performance, and scalability necessary for widearea systems. I have developed a framework for constructing group communication systems, for classifying existing distributed system tools, and for constructing and reasoning about a particular group communication model. It has four components: message delivery, message ordering, group membership, and the application. Each component may have a different implementation, so that the group mechanism can be tailored to application requirements. The framework supports a new message delivery protocol, called timestamped antientropy, which provides reliable, eventual message delivery; is efficient; and tolerates most transient processor and network failures. It can be combined with message ordering implementations that provide ordering guarantees ranging from unordered to total, causal delivery. A new group membership protocol completes the set, providing temporarily inconsistent membership views resilient to up to k simultaneous principal failures. The Refdbms distributed bibliographic database system, which has been constructed using this framework, is used as an example. Refdbms databases can be replicated on many different sites, using the group communication system described here.
A Market Protocol for Decentralized Task Allocation
- In The Proceedings of the Third International Conference on Multi-Agent Systems (ICMAS-98
, 1998
"... We present a decentralized market protocol for allocating tasks among agents that contend for scarce resources. Agents trade tasks and resources at prices determined by an auction protocol. We specify a simple set of bidding policies that, along with the auction mechanism, exhibits desirable converg ..."
Abstract
-
Cited by 71 (7 self)
- Add to MetaCart
We present a decentralized market protocol for allocating tasks among agents that contend for scarce resources. Agents trade tasks and resources at prices determined by an auction protocol. We specify a simple set of bidding policies that, along with the auction mechanism, exhibits desirable convergence properties. The system always reaches quiescence. If the system reaches quiescence below the consumer 's reserve price for the high level task, it will be in a solution state. If the system finds a solution it will reach quiescence in a solution state. Experimental evidence supports our conjecture that the system will converge to a solution when one exists and the consumer bids sufficiently high. We describe the system's application to and implementation in an agent-based digital library. 1. Introduction In a multiagent system (MAS), we must often address the problem of allocating resources and effort in such a way that the resulting collection of agents can accomplish a complex task. T...
Consensus Service: a modular approach for building agreement protocols in distributed systems
- IEEE Transactions on Software Engineering
, 1996
"... This paper describes a consensus service and suggests its use for the construction of fault-tolerant agreement protocols. We show how to build agreement protocols, using a classical client-server interaction, where (1) the clients are the processes that must solve the agreement problem, and (2) the ..."
Abstract
-
Cited by 45 (19 self)
- Add to MetaCart
This paper describes a consensus service and suggests its use for the construction of fault-tolerant agreement protocols. We show how to build agreement protocols, using a classical client-server interaction, where (1) the clients are the processes that must solve the agreement problem, and (2) the servers implement the consensus service. Using a generic notion, called consensus filter, we illustrate our approach on non-blocking atomic commitment and on view synchronous multicast. The approach can trivially be used for total order broadcast. In addition of its modularity, our approach enables efficient implementations of the protocols, and precise characterization of their liveness. 1 Introduction General services, used to build distributed applications, or to implement higher level distributed services, have become common in distributed systems. Examples are numerous: file servers, time servers, name servers, authentication servers, etc. However, there have been very few proposals o...
RTCAST: Lightweight Multicast for Real-Time Process Groups
- in IEEE Real-Time Technology and Applications Symposium
, 1996
"... We propose a lightweight fault-tolerant multicast and membership service for real-time process groups which may exchange periodic and aperiodic messages. The service supports bounded-time message transport, atomicity, and order for multicasts within a group of communicating processes in the presence ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
We propose a lightweight fault-tolerant multicast and membership service for real-time process groups which may exchange periodic and aperiodic messages. The service supports bounded-time message transport, atomicity, and order for multicasts within a group of communicating processes in the presence of processor crashes and communication failures. It guarantees agreement on membership among the communicating processors, and ensures that membership changes (e.g., resulting from processor joins or departures) are atomic and ordered with respect to multicast messages. We provide the flexibility of an event-triggered approach with the fast message delivery time of time-triggered protocols, such as TTP [14], where messages are delivered to the application immediately upon reception. This is achieved without compromising agreement, order and atomicity properties. In addition to the design and details of the algorithm, we describe our implementation of the protocol using the x-Kernel protocol architecture running on RT Mach 3.0. 1.
The generic consensus service
- in Proceedings of the 26th IEEE International Symposium on Fault-Tolerant Computing (FTCS-26
, 1998
"... AbstractÐThis paper describes a modular approach for the construction of fault-tolerant agreement protocols. The approach is based on a generic consensus service. Fault-tolerant agreement protocols are built using a client-server interaction, where the clients are the processes that must solve the a ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
AbstractÐThis paper describes a modular approach for the construction of fault-tolerant agreement protocols. The approach is based on a generic consensus service. Fault-tolerant agreement protocols are built using a client-server interaction, where the clients are the processes that must solve the agreement problem and the servers implement the consensus service. This service is accessed through a generic consensus filter, customized for each specific agreement problem. We illustrate our approach on the construction of various fault-tolerant agreement protocols, such as nonblocking atomic commitment, group membership, view synchronous communication, and total order multicast. Through a systematic reduction to consensus, we provide a simple way to solve agreement problems. In addition to its modularity, our approach enables efficient implementations of agreement protocols and precise characterization of the assumptions underlying their liveness and safety properties. Index TermsÐAsynchronous distributed systems, consensus, fault-tolerant agreement protocols, failure detectors, modularity, atomic commitment, group membership, view synchrony, total order multicast. æ 1
Consensus: the Big Misunderstanding
, 1997
"... The paper aims at clarifying some misunderstandings about the consensus problem. These misunderstandings prevent consensus from being considered as it should be, i.e., a fundamental paradigm in the context of fault-tolerant distributed systems, not only from a theoretical point of view, but also fro ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
The paper aims at clarifying some misunderstandings about the consensus problem. These misunderstandings prevent consensus from being considered as it should be, i.e., a fundamental paradigm in the context of fault-tolerant distributed systems, not only from a theoretical point of view, but also from a practical point of view. Six frequent misunderstandings are discussed. Misunderstanding 1: Consensus is for theoreticians only Consensus can be viewed as a general form of agreement in distributed systems [17]. The problem is defined over a set of processes fp 1 ; p 2 ; : : : ; pn g: each process p i has an initial value v i , and the correct processes (those that do not crash) have to decide on a common value v that is the initial value of one of the processes [3]. This problem has attracted theoreticians for over 15 years and has resulted in a large body of work, the most known being the Fischer, Lynch and Paterson result proving that consensus is not solvable in an asynchronous syst...
Conceptual and Implementation Models for the Grid
- In Proceedings of the IEEE, Special Issue on Grid Computing
, 2005
"... The Grid is rapidly emerging as the dominant paradigm for wide area distributed application systems. As a result, there is a need for modeling and analyzing the characteristics and requirements of Grid systems and programming models. This paper adopts the well-established body of models for distribu ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
The Grid is rapidly emerging as the dominant paradigm for wide area distributed application systems. As a result, there is a need for modeling and analyzing the characteristics and requirements of Grid systems and programming models. This paper adopts the well-established body of models for distributed computing systems, which are based upon carefully stated assumptions or axioms, as a basis for defining and characterizing Grids and their programming models and systems. The requirements of programming Grid applications and the resulting requirements on the underlying virtual organizations and virtual machines are investigated. The assumptions underlying some of the programming models and systems currently used for Grid applications are identified and their validity in Grid environments is discussed. A more in-depth analysis of two programming systems, the Imperial College E-Science Networked Infrastructure (ICENI) and Accord, using the proposed definitions’ structure is presented. Keywords—Distributed systems, Grid programming models, Grid programming systems, Grid system definition. I.
Modeling Replica Divergence in a Weak-Consistency Protocol for Global-Scale Distributed Data Bases
, 1993
"... this paper. References ..."

