Results 1 - 10
of
62
Transis: A Communication Sub-System for High Availability
, 1992
"... This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary ..."
Abstract
-
Cited by 337 (46 self)
- Add to MetaCart
This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary topology. The communication domain comprises of a set of processors that can initiate multicast messages to a chosen subset. Transis delivers them reliably and maintains the membership of connected processors automatically, in the presence of arbitrary communication delays, of message losses and of processor failures and joins. The contribution of this paper is in providing an aggregate definition of communication and control services over broadcast domains. The main benefit is the efficient implementation of these services using the broadcast capability. In addition, the membership algorithm has a novel approach in handling partitions and remerging; in allowing the regular flow of messages...
Group Communication Specifications: A Comprehensive Study
- ACM Computing Surveys
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 284 (12 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to th ..."
Abstract
-
Cited by 209 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
The Totem Single-Ring Ordering and Membership Protocol
, 1995
"... Operating Systems]: Organization and Design---distributed systems General Terms: Protocols, Performance, Reliability Additional Key Words and Phrases: Flow control, membership, reliable delivery, token passing, total ordering, virtual synchrony Earlier versions of the Totem single-ring protocol app ..."
Abstract
-
Cited by 166 (30 self)
- Add to MetaCart
Operating Systems]: Organization and Design---distributed systems General Terms: Protocols, Performance, Reliability Additional Key Words and Phrases: Flow control, membership, reliable delivery, token passing, total ordering, virtual synchrony Earlier versions of the Totem single-ring protocol appeared in the Proceedings of the IEE International Conference on Information Engineering, Singapore (December 1991) and in the Proceedings of the IEEE 13th International Conference on Distributed Computing Systems, Pittsburgh, PA (May 1993). This research was supported by NSF Grant No. NCR-9016361, ARPA Contract No. N00174-93K -0097, and Rockwell CMC/State of California MICRO Grant No. 92-101. Authors' Addresses: Y. Amir, Computer Science Department, The Hebrew University of Jerusalem, Israel; L. E. Moser and P. M. Melliar-Smith, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106; D. A. Agarwal, Lawrence Berkele
Using Process Groups to Implement Failure Detection in Asynchronous Environments
, 1991
"... Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes co-operate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. In this pap ..."
Abstract
-
Cited by 157 (15 self)
- Add to MetaCart
Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes co-operate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. In this paper we discuss the Group Membership Problem as it relates to failure detection in asynchronous, distributed systems. We present a rigorous, formal specification for group membership under this interpretation. We then present a solution for this problem that improves upon previous work.
A dynamic network architecture
- ACM Transactions on Computer Systems
, 1992
"... Network software is a critical component of any distributed system. Because of its complexity, network software is commonly layered into a hierarchy of protocols, or more generally, into a protocol graph Typical protocol graphs-including those standardized in the IS0 and TCP/IP network architectures ..."
Abstract
-
Cited by 153 (10 self)
- Add to MetaCart
Network software is a critical component of any distributed system. Because of its complexity, network software is commonly layered into a hierarchy of protocols, or more generally, into a protocol graph Typical protocol graphs-including those standardized in the IS0 and TCP/IP network architectures-share three important properties: the protocol graph is simple, the nodes of the graph (protocols) encapsulate complex functionality, and the topology of the graph is relatively static. This paper describes a new way to organize network software that differs from conventional architectures in all three of these properties In our approach, the protocol graph is complex, individual protocols encapsulate a single function. and the topology of the graph is dynamic. The main contribution of this paper is to describe the ideas behind our new architec-ture, illustrate the advantages of using the architecture, and demonstrate that the architecture results in efficient network software.
Newtop: A Fault-Tolerant Group Communication Protocol
, 1995
"... : A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large, and processes could be communicating over the Internet. Asynchronous communication environment is therefore assumed whe ..."
Abstract
-
Cited by 146 (21 self)
- Add to MetaCart
: A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large, and processes could be communicating over the Internet. Asynchronous communication environment is therefore assumed where message transmission times cannot be accurately estimated, and the underlying network may well get partitioned, preventing functioning processes from communicating with each other. Newtop can provide causality preserving total order delivery to members of a group, ensuring that total order delivery is preserved for multi-group processes. Both symmetric and asymmetric order protocols are supported, permitting a process to use say symmetric version in one group and asymmetric version in other. Key words: group communication, group membership, fault tolerance, network protocol, multicast protocol, causal order, total order. 1. Introduction Many fault-tolerant distributed applications can ...
On the Impossibility of Group Membership
, 1996
"... We prove that the primary-partition group membership problem cannot be solved in asynchronous systems with crash failures, even if one allows the removal or killing of non-faulty processes that are erroneously suspected to have crashed. 1 Introduction The problem of group membership has been the ..."
Abstract
-
Cited by 146 (5 self)
- Add to MetaCart
We prove that the primary-partition group membership problem cannot be solved in asynchronous systems with crash failures, even if one allows the removal or killing of non-faulty processes that are erroneously suspected to have crashed. 1 Introduction The problem of group membership has been the focus of much theoretical and experimental work on fault-tolerant distributed systems. A group membership protocol manages the formation and maintenance of a set of processes called a group. For example, a group may be a set of processes that are cooperating towards a common task (e.g., the primary and backup servers of a database), a set of processes that share a common interest (e.g., clients that subscribe to a particular newsgroup), or the set of all processes in the system that are currently deemed to be operational. In general, a process may leave a group because it failed, it voluntarily requested to leave, or it is forcibly expelled by other members of the group. Similarly, a proces...
Consul: A Communication Substrate for Fault-Tolerant Distributed Programs
- DISTRIBUTED SYSTEMS ENGINEERING JOURNAL
, 1991
"... Replicating important services on multiple processors in a distributed architecture is a common technique for constructing dependable computing systems. This paper describes a communication substrate, called Consul, that facilitates the development of such systems by providing a collection of fun ..."
Abstract
-
Cited by 118 (22 self)
- Add to MetaCart
Replicating important services on multiple processors in a distributed architecture is a common technique for constructing dependable computing systems. This paper describes a communication substrate, called Consul, that facilitates the development of such systems by providing a collection of fundamental abstractions for constructing fault-tolerant programs based on replicated processing. These abstractions include a multicast service, a membership service, and a recovery service. Consul is unique in two respects. First, its services are implemented using a collection of algorithms that exploit the partial (or causal) ordering of messages exchanged in the system. Such algorithms are generally more efficient than those that depend on a total ordering of events. Second, its underlying architecture is configurable, thereby allowing a system to be structured according to the needs of the application. The paper sketches Consul's architecture, presents the algorithms used by its pr...
Coyote: A System for Constructing Fine-Grain Configurable Communication Services
- ACM Transactions on Computer Systems
, 1998
"... Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This paper describes Coyote, a system that supports the construction of highly modular ..."
Abstract
-
Cited by 85 (15 self)
- Add to MetaCart
Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This paper describes Coyote, a system that supports the construction of highly modular and configurable versions of such abstractions. Coyote extends the notion of protocol objects and hierarchical composition found in existing systems with support for finer-grain objects called micro-protocols that implement individual semantic properties of the target service. A customized service is constructed by selecting micro-protocols based on their semantic guarantees and configuring them together with a standard runtime system to form a composite protocol implementing the service. Micro-protocols within a composite protocol can share data and are executed using an event-driven paradigm that enhances configurability. The overall approach is described and illustrated with exampl...

