Results 1 - 10
of
51
Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology
"... We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidat ..."
Abstract
-
Cited by 413 (43 self)
- Add to MetaCart
We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidate, and a new protocol called lazy hybrid. This lazy hybrid protocol combines the benefits of both lazy update and lazy invalidate. Our simulations indicate that with the processors and networks that are becoming available, coarse-grained applications such as Jacobi and TSP perform well, more or less independent of the protocol used. Medium-grained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications such as Cholesky achieve little speedup regardless of the protocol used because of the frequency of synchronization operations and the high latency involved. While the use of relaxed memory models, lazy implementations, and multiple-writer protocols has reduced the impact of false sharing, synchronization latency remains a serious problem for software distributed shared memory systems. These results suggest that future work on software DSMs should concentrate on reducing the amount ofsynchronization or its effect.
Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail
- In search of the holy grail. Distributed Computing
, 1994
"... : The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results conce ..."
Abstract
-
Cited by 187 (4 self)
- Add to MetaCart
: The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships. Keywords: Distributed Computation, Causality, Distributed System, Causal Ordering, Logical Time, Vector Time, Global Predicate Detection, Distributed Debugging, ...
A Unified Formalization of Four Shared-Memory Models
- IEEE Transactions on Parallel and Distributed Systems
, 1993
"... This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory model, and datarace -free-0. The most intuitive and commonly assumed shared-memory model, sequential con ..."
Abstract
-
Cited by 105 (9 self)
- Add to MetaCart
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory model, and datarace -free-0. The most intuitive and commonly assumed shared-memory model, sequential consistency, limits performance. The models of weak ordering, release consistency, the VAX, and data-race-free-0 are based on the common intuition that if programs synchronize explicitly and correctly, then sequential consistency can be guaranteed with high performance. However, each model formalizes this intuition differently and has different advantages and disadvantages with respect to the other models. Data-race-free-1 unifies the models of weak ordering, release consistency, the VAX, and data-race-free-0 by formalizing the above intuition in a manner that retains the advantages of each of the four models. A multiprocessor is data-race-free-1 if it guarantees sequential consistency to data-...
Memory Consistency Models
, 1993
"... This paper discusses memory consistency models and their influence on software in the context of parallel machines. In the first part we review previous work on memory consistency models. The second part discusses the issues that arise due to weakening memory consistency. We are especially intereste ..."
Abstract
-
Cited by 98 (0 self)
- Add to MetaCart
This paper discusses memory consistency models and their influence on software in the context of parallel machines. In the first part we review previous work on memory consistency models. The second part discusses the issues that arise due to weakening memory consistency. We are especially interested in the influence that weakened consistency models have on language, compiler, and runtime system design. We conclude that tighter interaction between those parts and the memory system might improve performance considerably.
Techniques for reducing consistency-related communication in distributed shared memory systems
- ACM Transactions on Computer Systems. Toappear
"... Distributed shared memory (DSM) is a software abstraction of shared memory on a distributed memory machine. One of the key problems in building an e cient DSM system is to reduce the amount ofcommunication needed to keep the distributed memories consistent. In this paper we presentfour techniques fo ..."
Abstract
-
Cited by 93 (13 self)
- Add to MetaCart
Distributed shared memory (DSM) is a software abstraction of shared memory on a distributed memory machine. One of the key problems in building an e cient DSM system is to reduce the amount ofcommunication needed to keep the distributed memories consistent. In this paper we presentfour techniques for doing so: (1) software release consistency � (2) multiple consistency protocols � (3) write-shared protocols � and (4) an update-with-timeout mechanism. These techniques have been implemented in the Munin DSM system. We compare the performance of seven application programs on Munin, rst to their performance when implemented using message passing and then to their performance when running on a conventional DSM system that does not embody the above techniques. On a 16-processor cluster of workstations, Munin's performance is within 5 % of message passing for four out of the seven applications. For the other three, performance is within 29 % to 33%. Detailed analysis of two of these three applications indicates that the addition of a function shipping capability would bring their performance within 7 % of the message passing performance. Compared to a conventional DSM system, Munin achieves performance improvements ranging from a few to several hundred percent, depending on the application.
Sequential Consistency versus Linearizability
, 1994
"... The power of two well-known consistency conditions for shared-memory multiprocessors, sequential consistency and linearizability, is compared. The cost measure studied is the worst-case response time in distributed implementations of virtual shared memory supporting one of the two conditions. Three ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
The power of two well-known consistency conditions for shared-memory multiprocessors, sequential consistency and linearizability, is compared. The cost measure studied is the worst-case response time in distributed implementations of virtual shared memory supporting one of the two conditions. Three types of shared-memory objects are considered: read/write objects, FIFO queues, and stacks. If clocks are only approximately synchronized (or do not exist), then for all three object types it is shown that linearizability is more expensive than sequential consistency: We present upper bounds for sequential consistency and larger lower bounds for linearizability. We show that, for all three data types, the worst-case response time is very sensitive to the assumptions that are made about the timing information available to the system. Under the strong assumption that processes have perfectly synchronized clocks, it is shown that sequential consistency and linearizability are equally costly: We present upper bounds for linearizability and matching lower bounds for sequential consistency. The upper bounds are shown by present-ing algorithms that use atomic broadcast in a modular fashion. The lower-bound proofs for the approximate case use the technique of “shifting,” first introduced for studying the clock synchronization problem.
Causal Memory: Definitions, Implementation and Programming
, 1994
"... The abstraction of a shared memory is of growing importance in distributed computing systems. Traditional memory consistency ensures that all processes agree on a common order of all operations on memory. Unfortunately, providing these guarantees entails access latencies that prevent scaling to larg ..."
Abstract
-
Cited by 78 (9 self)
- Add to MetaCart
The abstraction of a shared memory is of growing importance in distributed computing systems. Traditional memory consistency ensures that all processes agree on a common order of all operations on memory. Unfortunately, providing these guarantees entails access latencies that prevent scaling to large systems. This paper weakens such guarantees by defining causal memory, an abstraction that ensures that processes in a system agree on the relative ordering of operations that are causally related. Because causal memory is weakly consistent, it admits more executions, and hence more concurrency, than either atomic or sequentially consistent memories. This paper provides a formal definition of causal memory and gives an implementation for message-passing systems. In addition, it describes a practical class of programs that, if developed for a strongly consistent memory, run correctly with causal memory. College of Computing Georgia Institute of Technology Atlanta, Georgia 30332-0280 This ...
Wait-Free Data Structures in the Asynchronous PRAM Model
- In Proceedings of the 2nd Annual Symposium on Parallel Algorithms and Architectures
, 2000
"... In the asynchronous PRAM model, processes communicate by atomically reading and writing shared memory locations. This paper investigates the extent to which asynchronous PRAM permits long-lived, highly concurrent data structures. An implementation of a concurrent object is wait-free if every operati ..."
Abstract
-
Cited by 62 (11 self)
- Add to MetaCart
In the asynchronous PRAM model, processes communicate by atomically reading and writing shared memory locations. This paper investigates the extent to which asynchronous PRAM permits long-lived, highly concurrent data structures. An implementation of a concurrent object is wait-free if every operation will complete in a finite number of steps, and it is k-bounded wait-free, for some k > 0, if every operation will complete within k steps. In the first part of this paper, we show that there are objects with wait-free implementations but no k-bounded wait-free implementations for any k, and that there is an infinite hierarchy of objects with implementations that are k-bounded wait-free but not K-bounded wait-free for some K > k. In the second part of the paper, we give an algebraic characterization of a large class of objects that do have wait-free implementations in asynchronous PRAM, as well as a general algorithm for implementing them. Our tools include simple iterative algorithms for wait-free approximate agreement and atomic snapshot.
Efficient Distributed Shared Memory Based On Multi-Protocol Release Consistency
, 1994
"... A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the s ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the source code. The performance challenge translates into reducing the amount of communication performed by the DSM system to that performed by an equivalent message passing program. This thesis describes four novel techniques for reducing the communication overhead of DSM, including: (i) the use of software release consistency, (ii) support for multiple consistency protocols, (iii) a multiple writer protocol, and (iv) an update timeout mechanism. Release consistency allows modifications of shared data to be handled via a delayed update queue, which masks network latencies. Providing multiple cons...

