Results 1 - 10
of
37
Group Communication Specifications: A Comprehensive Study
- ACM Computing Surveys
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 284 (12 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
The Transis Approach to High Availability Cluster Communication
- Communications of the ACM
, 1996
"... Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which rep ..."
Abstract
-
Cited by 225 (13 self)
- Add to MetaCart
Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which replicated the updates to all of the other computers. Whenever the current tally was desired, any computer could be used to supply an up-to-the-moment count. On the night of an important election, a room with one of the computers became crowded with lobbyists and politicians. Unexpectedly, someone accidentally stepped on the network wire, cutting communication between two parts of the network. The vote counting stopped until the network was repaired, and the entire tally had to be restarted from scratch. This would not have happened if the vote-counting system had been built with partitions in mind. After the unexpected severance, vote counting could have continued at all t
Peer-to-Peer Support for Massively Multiplayer Games
, 2004
"... We present an approach to support massively multi-player games on peer-to-peer overlays. Our approach exploits the fact that players in MMGs display locality of interest, and therefore can form self-organizing groups based on their locations in the virtual world. To this end, we have designed scalab ..."
Abstract
-
Cited by 132 (2 self)
- Add to MetaCart
We present an approach to support massively multi-player games on peer-to-peer overlays. Our approach exploits the fact that players in MMGs display locality of interest, and therefore can form self-organizing groups based on their locations in the virtual world. To this end, we have designed scalable mechanisms to distribute the game state to the participating players and to maintain consistency in the face of node failures. The resulting system dynamically scales with the number of online players. It is more flexible and has a lower deployment cost than centralized games servers. We have implemented a simple game we call SimMud, and experimented with up to 4000 players to demonstrate the applicability of this approach.
Specifying and Using a Partitionable Group Communication Service
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1997
"... Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say ..."
Abstract
-
Cited by 102 (18 self)
- Add to MetaCart
Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say, especially in cases where the services are partitionable, that is, where communication failures may lead to simultaneous creation of groups with disjoint memberships, such that each group is unaware of the existence of any other group. In this paper
RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks
- In DISC
, 2002
"... This paper presents an algorithm that emulates atomic read/write shared objects in a dynamic network setting. To ensure availability and fault-tolerance, the objects are replicated. To ensure atomicity, reads and writes are performed using quorum configurations, each of which consists of a set of me ..."
Abstract
-
Cited by 85 (11 self)
- Add to MetaCart
This paper presents an algorithm that emulates atomic read/write shared objects in a dynamic network setting. To ensure availability and fault-tolerance, the objects are replicated. To ensure atomicity, reads and writes are performed using quorum configurations, each of which consists of a set of members plus sets of read-quorums and write-quorums. The algorithm is reconfigurable: the quorum configurations may change during computation, and such changes do not cause violations of atomicity. Any quorum configuration may be installed at any time. The algorithm tolerates processor stopping failure and message loss. The algorithm performs three major tasks, all concurrently: reading and writing objects, introducing new configurations, and "garbage-collecting" obsolete configurations.
Replication Using Group Communication Over a Partitioned Network
, 1995
"... In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the response time to be adversely affected. In such circumstances, replicating data or servers may improve performance. Replication may also improve the availability of info ..."
Abstract
-
Cited by 81 (19 self)
- Add to MetaCart
In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the response time to be adversely affected. In such circumstances, replicating data or servers may improve performance. Replication may also improve the availability of information when processors crash or the network partitions. Existing replication methods are often needlessly expensive. They sometimes use pointto -point communication when multicast communication is available; they typically pay the full price of end-to-end acknowledgments for all of the participants for every update; they may claim locks, and therefore, may be vulnerable to faults that can unnecessarily block the system for long periods of time. This thesis presents a new architecture and algorithms for replication over a partitioned network. The architecture is structured into two layers: a replication server and a group communication layer. Each of the replication servers maintains a priva...
Efficient Message Ordering in Dynamic Networks
- In 15th ACM Symposium on Principles of Distributed Computing (PODC
, 1996
"... We present an algorithm for totally ordering messages in the face of network partitions and site failures. The algorithm always allows a majority of connected processors in the network to make progress (i.e. to order messages), if they remain connected for sufficiently long, regardless of past failu ..."
Abstract
-
Cited by 60 (18 self)
- Add to MetaCart
We present an algorithm for totally ordering messages in the face of network partitions and site failures. The algorithm always allows a majority of connected processors in the network to make progress (i.e. to order messages), if they remain connected for sufficiently long, regardless of past failures. Furthermore, our algorithm always allows processors to initiate messages, even when they are not members of a connected majority component in the network. Thus, messages can eventually become totally ordered even if their initiator is never a member of a majority component. The algorithm guarantees that when a majority is connected, each message is ordered within two communication rounds, if no failures occur during these rounds. 1 Introduction Consistent order is a powerful paradigm for the design of fault tolerant applications, e.g. consistent replication [Sch90, Kei94]. We present an efficient algorithm for consistent message ordering in the face of network partitions and site fail...
The Database State Machine Approach
, 1999
"... Database replication protocols have historically been built on top of distributed database systems, and have consequently been designed and implemented using distributed transactional mechanisms, such as atomic commitment. We present the database state machine approach, a new way to deal with databa ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
Database replication protocols have historically been built on top of distributed database systems, and have consequently been designed and implemented using distributed transactional mechanisms, such as atomic commitment. We present the database state machine approach, a new way to deal with database replication in a cluster of servers. This approach relies on a powerful atomic broadcast primitive to propagate transactions between database servers, and no atomic commitment is necessary. Transaction commit is based on a certification test, and abort rate is reduced by the reordering certification test. The approach is evaluated using a detailed simulation model that shows the scalability of the system and the benefits of the reordering certification test.
A Framework for Partitionable Membership Service
, 1995
"... This paper presents a framework for a membership service that operates in a partitionable environment and supports partitionable operation, which is a form of distributed operation in which multiple network components that are (temporarily) disconnected from each other operate autonomously. The serv ..."
Abstract
-
Cited by 54 (6 self)
- Add to MetaCart
This paper presents a framework for a membership service that operates in a partitionable environment and supports partitionable operation, which is a form of distributed operation in which multiple network components that are (temporarily) disconnected from each other operate autonomously. The service assumes an asynchronous environment and must tolerate crash failures, omission failures and network partitions. The principles of partitionable operation that we present here have been incorporated in the Transis system [13, 1], the Totem system [3], and the Horus system [19]. The paper discusses applications built in these projects, and relates them to the membership service definition. We introduce a distinction between partial and complete installations of system views that makes feasible what we believe are the strongest possible requirements for causal order and virtual synchrony. We propose our specification of partitionable membership service as a standard against which other memb...
Exploiting Atomic Broadcast in Replicated Databases
, 1998
"... Database replication protocols have historically been built on top of distributed database systems, and have consequently been designed and implemented using distributed transactional mechanisms, such as atomic commitment. We argue in this paper that this approach is not always adequate to efficient ..."
Abstract
-
Cited by 43 (9 self)
- Add to MetaCart
Database replication protocols have historically been built on top of distributed database systems, and have consequently been designed and implemented using distributed transactional mechanisms, such as atomic commitment. We argue in this paper that this approach is not always adequate to efficiently support database replication and that more suitable alternatives, such as atomic broadcast primitives, should be employed instead. More precisely, we show in this paper that fully replicated database systems, based on the deferred update replication model, have better throughput and response time if implemented with an atomic broadcast termination protocol than if implemented with atomic commitment. 1 Introduction Replication is considered a cheap software based way to increase data availability when compared to hardware based specialised techniques [16]. However, designing a replication scheme that provides reasonable performance and maintains data consistency is still an active area of...

