Results 1 - 10
of
36
Lightweight causal and atomic group multicast
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1991
"... ..."
(Show Context)
Understanding Fault-Tolerant Distributed Systems
- COMMUNICATIONS OF THE ACM
, 1993
"... We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design ..."
Abstract
-
Cited by 377 (23 self)
- Add to MetaCart
(Show Context)
We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design alternatives, we discuss their relative merits and we give examples of systems which adopt one approach or the other. The aim is to introduce some order in the complex discipline of designing and understanding fault-tolerant distributed systems.
Orca: A language for parallel programming of distributed systems
- IEEE Transactions on Software Engineering
, 1992
"... Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data ..."
Abstract
-
Cited by 332 (46 self)
- Add to MetaCart
(Show Context)
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems. 1.
Total order broadcast and multicast algorithms: Taxonomy and survey
- ACM COMPUTING SURVEYS
, 2004
"... ..."
An Efficient Reliable Broadcast Protocol
- OPERATING SYSTEMS REVIEW
, 1989
"... Many distributed and parallel applications can make good use of broadcast communication. In this paper we present a (software) protocol that simulates reliable broadcast, even on an unreliable network. Using this protocol, application programs need not worry about lost messages. Recovery of comm ..."
Abstract
-
Cited by 149 (13 self)
- Add to MetaCart
Many distributed and parallel applications can make good use of broadcast communication. In this paper we present a (software) protocol that simulates reliable broadcast, even on an unreliable network. Using this protocol, application programs need not worry about lost messages. Recovery of communication failures is handled automatically and transparently by the protocol. In normal operation, our protocol is more efficient than previously published reliable broadcast protocols. An initial implementation of the protocol on 10 MC68020 CPUs connected by a 10 Mbit/sec Ethernet performs a reliable broadcast in 1.5 msec.
Broadcast protocols for distributed systems
- IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Abstract-We present an innovative approach to the design of faulttolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The processors agree on exactly the same sequence of broadcast messages. approach is based on broadc ..."
Abstract
-
Cited by 147 (19 self)
- Add to MetaCart
Abstract-We present an innovative approach to the design of faulttolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The processors agree on exactly the same sequence of broadcast messages. approach is based on broadcast communication over a local area network, such as an Ethernet or a token ring, and on two novel protocols, It is easy to demonstrate that placing a total order on broadcast messages, so that every working processor procthe Tram protocol, which provides efficient reliable broadcast communi- esses the same messages in the same order, provides an cation, and the Total protocol, which with high probability promptly places a total order on messages and achieves distributed agreement even in the presence of fail-stoo. omission. timing, and communication faults. Reliable distributed operations such as locking, update and commitment, typically require only a single broadcast message rather than the several immediate solution to the agreement problem. Once this total order is determined, distributed actions can be carried out using simple sequential fault-tolerant algorithms. The strategy is very efficient: for example, locking records in a distributed tens of messages required by current algorithms. database typically requires only a single broadcast message to claim a lock and a single broadcast message to release it. Index Terms-Agreement problem, broadcast communication, communication protocols, distributed systems, fault tolerance, local area networks, total order. Based on this strategy, it is possible to design simple and efficient but very robust distributed systems.
Sequential Consistency versus Linearizability
, 1994
"... The power of two well-known consistency conditions for shared-memory multiprocessors, sequential consistency and linearizability, is compared. The cost measure studied is the worst-case response time in distributed implementations of virtual shared memory supporting one of the two conditions. Three ..."
Abstract
-
Cited by 121 (2 self)
- Add to MetaCart
The power of two well-known consistency conditions for shared-memory multiprocessors, sequential consistency and linearizability, is compared. The cost measure studied is the worst-case response time in distributed implementations of virtual shared memory supporting one of the two conditions. Three types of shared-memory objects are considered: read/write objects, FIFO queues, and stacks. If clocks are only approximately synchronized (or do not exist), then for all three object types it is shown that linearizability is more expensive than sequential consistency: We present upper bounds for sequential consistency and larger lower bounds for linearizability. We show that, for all three data types, the worst-case response time is very sensitive to the assumptions that are made about the timing information available to the system. Under the strong assumption that processes have perfectly synchronized clocks, it is shown that sequential consistency and linearizability are equally costly: We present upper bounds for linearizability and matching lower bounds for sequential consistency. The upper bounds are shown by present-ing algorithms that use atomic broadcast in a modular fashion. The lower-bound proofs for the approximate case use the technique of “shifting,” first introduced for studying the clock synchronization problem.
A System for Constructing Configurable High-Level
- Protocols,” Proc. Conf. Application, Technologies, Architectures, and Protocols for Computer Communications
, 1995
"... New distributed computing applications are driving the development of more specialized protocols, as well as demanding greater control over the communication substrate. Here, a network subsystem that supports modular, finegrained construction of high-level protocols such as atomic multicast and grou ..."
Abstract
-
Cited by 62 (8 self)
- Add to MetaCart
New distributed computing applications are driving the development of more specialized protocols, as well as demanding greater control over the communication substrate. Here, a network subsystem that supports modular, finegrained construction of high-level protocols such as atomic multicast and group RPC is described. The approach is based on extending the standard hierarchical model of the x-kernel with composite protocols in which micro-protocol objects are composed within a standard runtime framework. Each micro-protocol realizes a separate semantic property, leading to a highly modular and configurable implementation. In contrast with similar systems, this approach provides finer granularity and more flexible inter-object communication. The design and prototype implementation runing on Mach are described. Performance results are also given for a micro-protocol suite implementing variants of group RPC. 1
xAMp: a Multi-primitive Group Communications Service
, 1992
"... The xAMp is a highly versatile group communications service aimed at supporting the development of distributed applications, with different dependability, functionality, and performance requirements. These range from unreliable and non-ordered to atomic multicast, and are enhanced by efficient group ..."
Abstract
-
Cited by 56 (26 self)
- Add to MetaCart
The xAMp is a highly versatile group communications service aimed at supporting the development of distributed applications, with different dependability, functionality, and performance requirements. These range from unreliable and non-ordered to atomic multicast, and are enhanced by efficient group addressing and management support. The basic protocols are synchronous, clock-less and designed to be used over broadcast local-area networks, and portable to a number of them. The functionality provided yields a reasonably complete solution to the problem of reliable group communication. Whilst other protocols exist that offer similar services, we follow a new engineering approach by deriving all qualities of service from a single basic procedure. Thus, their implementation shares data structures, procedures, failure-recovery algorithms and group monitor services, resulting in an highly integrated package. 1 Introduction Distributed systems are widely used today, encouraging the developme...
An Asynchronous Membership Protocol that Tolerates Partitions
, 1993
"... This paper presents a membership protocol for maintaining the set of operational and connected machines in agreement. The protocol operates in an asynchronous environment prone to crash failures, omission failures and network partitions. The protocol is suitable for systems with machines that commun ..."
Abstract
-
Cited by 55 (7 self)
- Add to MetaCart
(Show Context)
This paper presents a membership protocol for maintaining the set of operational and connected machines in agreement. The protocol operates in an asynchronous environment prone to crash failures, omission failures and network partitions. The protocol is suitable for systems with machines that communicate via broadcast (or multicast) messages. It supports continued operation with partitions and provides the mechanism for merging of partitions. The principles of the protocol presented here have been successfully incorporated into the Transis system [3, 2], the Totem system [4], and the Horus system [29]. The membership protocol presented here is integrated in the communication system, such that the notifications of membership changes are delivered to the application among the stream of regular messages. Changes to the membership are coordinated with the delivery of regular messages in the system. This valuable approach was presented in [7, 9] in the context of a primary-partition system,...