Results 1 - 10
of
151
PVM: A Framework for Parallel Distributed Computing
- Concurrency: Practice and Experience
, 1990
"... The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or ..."
Abstract
-
Cited by 691 (27 self)
- Add to MetaCart
The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or more networks. The participating processors may be scalar machines, multiprocessors, or special-purpose computers, enabling application components to execute on the architecture most appropriate to the algorithm. PVM provides a straightforward and general interface that permits the description of various types of algorithms (and their interactions), while the underlying infrastructure permits the execution of applications on a virtual computing environment that supports multiple parallel computation models. PVM contains facilities for concurrent, sequential, or conditional execution of application components, is portable to a variety of architectures, and supports certain forms of error dete...
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 433 (29 self)
- Add to MetaCart
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible ag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates O(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than
Small Byzantine Quorum Systems
- DISTRIBUTED COMPUTING
, 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract
-
Cited by 366 (48 self)
- Add to MetaCart
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
The Grid Protocol: A High Performance Scheme for Maintaining Replicated Data
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... We present a new protocol for maintaining replicated data that can provide both high data availability and low response time. In the protocol, the nodes are organized in a logical grid. Existing protocols are designed primarily to achieve high availability by updating a large fraction of the copies ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
We present a new protocol for maintaining replicated data that can provide both high data availability and low response time. In the protocol, the nodes are organized in a logical grid. Existing protocols are designed primarily to achieve high availability by updating a large fraction of the copies which provides some (although not significant) load sharing. In the new protocol, transaction processing is shared effectively among nodes storing copies of the data and both the response time experienced by transactions and the system throughput are improved significantly. We present an analysis of the availability of the new protocol and use simulation to study the effect of load sharing on the response time of transactions. We also compare the new protocol with a voting based scheme. This work was supported in part by NSF grants NCR-8604850 and CCR-8806358, and by the University Research Committee of Emory University. 1 Introduction A distributed system consists of cooperating process...
Paradigms for process interaction in distributed programs
- ACM Computing Surveys
, 1991
"... Distributed computations are concurrent programs in which processes communicate by message passing. Such programs typically execute on network architectures such as networks of workstations ordistributed memory parallel machines (i. e, multicomputers such ashypercubes). Several paradigms—examples or ..."
Abstract
-
Cited by 108 (0 self)
- Add to MetaCart
Distributed computations are concurrent programs in which processes communicate by message passing. Such programs typically execute on network architectures such as networks of workstations ordistributed memory parallel machines (i. e, multicomputers such ashypercubes). Several paradigms—examples or models—for process interaction
A tree-based algorithm for distributed mutual exclusion
- ACM Transactions on Computer Systems
, 1989
"... We present an algorithm for distributed mutual exclusion in a computer network of N nodes that communicate by messages rather than shared memory. The algorithm uses a spanning tree of the computer network, and the number of messages exchanged per critical section depends on the topology of this tree ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
We present an algorithm for distributed mutual exclusion in a computer network of N nodes that communicate by messages rather than shared memory. The algorithm uses a spanning tree of the computer network, and the number of messages exchanged per critical section depends on the topology of this tree. However, typically the number of messages exchanged is O(log N) under light demand, and reduces to approximately four messages under saturated demand. Each node holds information only about its immediate neighbors in the spanning tree rather than information about all nodes, and failed nodes can recover necessary information from their neighbors. The algorithm does not require sequence numbers as it operates correctly despite message overtaking.
A new approach to developing and implementing eager database replication protocols
- ACM TODS
"... Database replication is traditionally seen as a way to increase the availability and performance of distributed databases. Although a large number of protocols providing data consistency and fault-tolerance have been proposed, few of these ideas have ever been used in commercial products due to thei ..."
Abstract
-
Cited by 101 (12 self)
- Add to MetaCart
Database replication is traditionally seen as a way to increase the availability and performance of distributed databases. Although a large number of protocols providing data consistency and fault-tolerance have been proposed, few of these ideas have ever been used in commercial products due to their complexity and performance implications. Instead, current products allow inconsistencies and often resort to centralized approaches which eliminates some of the advantages of replication. As an alternative, we propose a suite of replication protocols that addresses the main problems related to database replication. On the one hand, our protocols maintain data consistency and the same transactional semantics found in centralized systems. On the other hand, they provide flexibility and reasonable performance. To do so, our protocols take advantage of the rich semantics of group communication primitives and the relaxed isolation guarantees provided by most databases. This allows us to eliminate the possibility of deadlocks, reduce the message overhead and increase performance. A detailed simulation study shows the feasibility of the approach and the flexibility with which different types of bottlenecks can be circumvented.
The SIFT Information Dissemination System
- ACM Transactions on Database Systems
, 2000
"... Information dissemination is a powerful mechanism for finding information in wide-area environments. An information dissemination server accepts long-term user queries, collects new documents from information sources, matches the documents against the queries, and continuously updates the users wi ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
Information dissemination is a powerful mechanism for finding information in wide-area environments. An information dissemination server accepts long-term user queries, collects new documents from information sources, matches the documents against the queries, and continuously updates the users with relevant information. This paper is a retrospective of the Stanford Information Filtering Service (SIFT), a system that as of April 1996 was processing over 40,000 worldwide subscriptions and over 80,000 daily documents. The paper describes some of the indexing mechanisms that were developed for SIFT, as well as the evaluations that were conducted to select a scheme to implement. It also describes the implementation of SIFT, and experimental results for the actual system. Finally, it also discusses and experimentally evaluates techniques for distributing a service such as SIFT for added performance and availability. Note to Referees: This paper contains material from three earlier...
The Load, Capacity and Availability of Quorum Systems
, 1998
"... A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication and dissemination of information Given a strategy to pick quorums, the load L(S) is th ..."
Abstract
-
Cited by 86 (12 self)
- Add to MetaCart
A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication and dissemination of information Given a strategy to pick quorums, the load L(S) is the minimal access probability of the busiest element, minimizing over the strategies. The capacity Cap(S) is the highest quorum accesses rate that S can handle, so Cap(S) = 1=L(S).
Ad Hoc Mobility Management with Uniform Quorum Systems
- IEEE/ACM Transactions on Networking
, 1999
"... Abstract — A distributed mobility-management scheme using a class of uniform quorum systems (UQS) is proposed for ad hoc networks. In the proposed scheme, location databases are stored in the network nodes themselves, which form a selforganizing virtual backbone within the flat network structure. Th ..."
Abstract
-
Cited by 79 (3 self)
- Add to MetaCart
Abstract — A distributed mobility-management scheme using a class of uniform quorum systems (UQS) is proposed for ad hoc networks. In the proposed scheme, location databases are stored in the network nodes themselves, which form a selforganizing virtual backbone within the flat network structure. The databases are dynamically organized into quorums, every two of which intersect at a constant number of databases. Upon location update or call arrival, a mobile’s location information is written to or read from all the databases of a quorum, chosen in a nondeterministic manner. Compared with a conventional scheme [such as the use of home location register (HLR)] with fixed associations, this scheme is more suitable for ad hoc networks, where the connectivity of the nodes with the rest of the network can be intermittent and sporadic and the databases are relatively unstable. We introduce UQS, where the size of the

