Results 1 - 10
of
411
Reliable Multicast Transport Protocol (RMTP)
"... This paper presents the design, implementation and performance of a reliable multicast transport protocol called RMTP. RMTP is based on a hierarchical structure in which receivers are grouped into local regions or domains and in each domain there is a special receiver called a Designated Receiver (D ..."
Abstract
-
Cited by 554 (9 self)
- Add to MetaCart
This paper presents the design, implementation and performance of a reliable multicast transport protocol called RMTP. RMTP is based on a hierarchical structure in which receivers are grouped into local regions or domains and in each domain there is a special receiver called a Designated Receiver (DR) which is responsible for sending acknowledgments periodically to the sender, for processing acknowledgements from receivers in its domain and for retransmitting lost packets to the corresponding receivers. Since lost packets are recovered by local retransmissions as opposed to retransmissions from the original sender, end-to-end latency is significantly reduced, and the overall throughput is improved as well. Also, since only the DRs send their acknowledgments to the sender, instead of all receivers sending their acknowledgments to the sender, a single acknowledgement is generated per local region, and this prevents acknowledgement implosion. Receivers in RMTP send their acknowledgments to the DRs periodically, thereby simplifying error recovery. In addition, lost packets are recovered by selective repeat retransmissions, leading to improved throughput at the cost of minimal additional buffering at the receivers. This paper also describes the implementation of RMTP and its performance on the Internet.
The process group approach to reliable distributed computing
- Communications of the ACM
, 1993
"... The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, ..."
Abstract
-
Cited by 501 (16 self)
- Add to MetaCart
The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation, and achieve high reliability. This paper reviews six years of resemr,.hon Isis, describing the model, its impl_nentation challenges, and the types of applicatiom to which Isis has been appfied. 1 In oducfion One might expect the reliability of a distributed system to follow directly from the reliability of its con-stituents, but this is not always the case. The mechanisms used to structure a distributed system and to implement cooperation between components play a vital role in determining how reliable the system will be. Many contemporary distributed operating systems have placed emphasis on communication performance, overlooking the need for tools to integrate components into a reliable whole. The communication primitives supported give generally reliable behavior, but exhibit problematic semantics when transient failures or system configuration changes occur. The resulting building blocks are, therefore, unsuitable for facilitating the construction of systems where reliability is impo/tant. This paper reviews six years of research on Isis, a syg_,,m that provides tools _ support the construction of reliable distributed software. The thesis underlying l._lS is that development of reliable distributed software can be simplified using process groups and group programming too/_. This paper motivates the approach taken, surveys the system, and discusses our experience with real applications.
Transis: A Communication Sub-System for High Availability
, 1992
"... This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary ..."
Abstract
-
Cited by 337 (46 self)
- Add to MetaCart
This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary topology. The communication domain comprises of a set of processors that can initiate multicast messages to a chosen subset. Transis delivers them reliably and maintains the membership of connected processors automatically, in the presence of arbitrary communication delays, of message losses and of processor failures and joins. The contribution of this paper is in providing an aggregate definition of communication and control services over broadcast domains. The main benefit is the efficient implementation of these services using the broadcast capability. In addition, the membership algorithm has a novel approach in handling partitions and remerging; in allowing the regular flow of messages...
Group Communication Specifications: A Comprehensive Study
- ACM Computing Surveys
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 284 (12 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
The Transis Approach to High Availability Cluster Communication
- Communications of the ACM
, 1996
"... Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which rep ..."
Abstract
-
Cited by 225 (13 self)
- Add to MetaCart
Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which replicated the updates to all of the other computers. Whenever the current tally was desired, any computer could be used to supply an up-to-the-moment count. On the night of an important election, a room with one of the computers became crowded with lobbyists and politicians. Unexpectedly, someone accidentally stepped on the network wire, cutting communication between two parts of the network. The vote counting stopped until the network was repaired, and the entire tally had to be restarted from scratch. This would not have happened if the vote-counting system had been built with partitions in mind. After the unexpected severance, vote counting could have continued at all t
Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail
- In search of the holy grail. Distributed Computing
, 1994
"... : The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results conce ..."
Abstract
-
Cited by 187 (4 self)
- Add to MetaCart
: The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships. Keywords: Distributed Computation, Causality, Distributed System, Causal Ordering, Logical Time, Vector Time, Global Predicate Detection, Distributed Debugging, ...
A High Performance Totally Ordered Multicast Protocol
, 1994
"... This paper presents the Reliable Multicast Protocol (RMP). RMP provides a totally ordered, reliable, atomic multicast service on top of an unreliable multicast datagram service such as IP Multicasting. RMP is fully and symmetrically distributed so that no site bears an undue portion of the communica ..."
Abstract
-
Cited by 183 (2 self)
- Add to MetaCart
This paper presents the Reliable Multicast Protocol (RMP). RMP provides a totally ordered, reliable, atomic multicast service on top of an unreliable multicast datagram service such as IP Multicasting. RMP is fully and symmetrically distributed so that no site bears an undue portion of the communication load. RMP provides a wide range of guarantees, from unreliable delivery to totally ordered delivery, to K-resilient, majority resilient, and totally resilient atomic delivery. These QoS guarantees are selectable on a per packet basis. RMP provides many communication options, including virtual synchrony, a publisher/subscriber model of message delivery, a client/server model of delivery, an implicit naming service, mutually exclusive handlers for messages, and mutually exclusive locks.
Extended Virtual Synchrony
- in Proceedings of the IEEE 14th International Conference on Distributed Computing Systems
, 1994
"... . We formulate a model of extended virtual synchrony that defines a group communication transport service for multicast and broadcast communication in a distributed system. The model extends the virtual synchrony model of the Isis system to support continued operation in all components of a partitio ..."
Abstract
-
Cited by 183 (42 self)
- Add to MetaCart
. We formulate a model of extended virtual synchrony that defines a group communication transport service for multicast and broadcast communication in a distributed system. The model extends the virtual synchrony model of the Isis system to support continued operation in all components of a partitioned network. The significance of extended virtual synchrony is that, during network partitioning and remerging and during process failure and recovery, it maintains a consistent relationship between the delivery of messages and the delivery of configuration changes across all processes in the system and provides well-defined self-delivery and failure atomicity properties. We describe an algorithm that implements extended virtual synchrony and construct a filter that reduces extended virtual synchrony to virtual synchrony. 1 Introduction In many applications in distributed systems messages must be disseminated to multiple destinations. To achieve better performance, protocols have been deve...
Secure agreement protocols: Reliable and atomic group multicast in Rampart
- In Proceedings of the 2nd ACM Conference on Computer and Communications Security
, 1994
"... Reliable and atomic group multicast have been pro-posed as fundamental communication paradigms to sup-port secure distributed computing in systems in which processes may behave maliciously. These protocols en-able messages to be multicast to a group of processes, while ensuring that all honest group ..."
Abstract
-
Cited by 162 (17 self)
- Add to MetaCart
Reliable and atomic group multicast have been pro-posed as fundamental communication paradigms to sup-port secure distributed computing in systems in which processes may behave maliciously. These protocols en-able messages to be multicast to a group of processes, while ensuring that all honest group members deliver the same messages and, in the case of atomic multi-cast, deliver these messages in the same order. We present new reliable and atomic group multicast pro-tocols for asynchronous distributed systems. We also describe their implementation as part of Rampart, a toolkit for building high-integrily distributed services, i.e., services that remain correct and available despite the corruption of some component servers by an at-tacker. To our knowledge, Rampart is the first system to demonstrate reliable and atomic group multicast in asynchronous systems subject to process corruptions. 1

