Results 1 - 10
of
48
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 628 (17 self)
- Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.
Weak Ordering -- A New Definition
, 1990
"... A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater perfor ..."
Abstract
-
Cited by 213 (12 self)
- Add to MetaCart
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an imple-mentation for cache-coherent systems chat is not allowed by the old definition.
Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors
, 1991
"... Distributed memory multiprocessing offers a cost-effective and scalable solution for a large class of scientific and numeric applications. Unfortunately, the performance of current distributed memory programming environments suffers because the frequency of communication between processors can excee ..."
Abstract
-
Cited by 170 (0 self)
- Add to MetaCart
Distributed memory multiprocessing offers a cost-effective and scalable solution for a large class of scientific and numeric applications. Unfortunately, the performance of current distributed memory programming environments suffers because the frequency of communication between processors can exceed that required to ensure a correctly functioning program. Midway is a shared memory parallel programming system which addresses the problem of excessive communication in a distributed memory multiprocessor. Midway programs are written using a conventional MIMD-style programming model executing within a single globally shared memory. Local memories on each processor cache recently used data to counter the effects of network latency. Midway is based on a new model of memory consistency called entry consistency. Entry consistency exploits the relationship between synchronization objects and the data which they protect. Updates to shared data are communicated between processors only when not ...
Scope Consistency : A Bridge between Release Consistency and Entry Consistency
- In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1996
"... The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy R ..."
Abstract
-
Cited by 135 (12 self)
- Add to MetaCart
The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy Release Consistency (LRC) are accepted to offer a reasonable tradeoff between performance and programming complexity. Entry Consistency (EC) offers a more relaxed consistency model, but it requires explicit association of shared data objects with synchronization variables. The programming burden of providing such associations can be substantial. This paper proposes a new consistency model for shared virtual memory, called Scope Consistency (ScC), which offers most of the potential performance advantages of the EC model without requiring explicit bindings between data and synchronization variables. Instead, ScC dynamically detects the bindings implied by the programmer allowing a programming i...
Lazy Caching
- ACM Transactions on Programming Languages and Systems
, 1993
"... This paper examines cache consistency conditions for multiprocessor shared memory systems. It states and motivates a weaker condition than is normally implemented. An algorithm is presented that exploits the weaker condition to achieve greater concurrency. The algorithm is shown to satisfy the weak ..."
Abstract
-
Cited by 77 (0 self)
- Add to MetaCart
This paper examines cache consistency conditions for multiprocessor shared memory systems. It states and motivates a weaker condition than is normally implemented. An algorithm is presented that exploits the weaker condition to achieve greater concurrency. The algorithm is shown to satisfy the weak consistency condition. Other properties of the algorithm and possible extensions are discussed.
Home-based Shared Virtual Memory
, 1998
"... In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when pe ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when performance-enhancing techniques from all these areas are combined. This dissertation proposes home-based lazy release consistency as a simple, effective, and scalable way to build shared virtual memory systems. In home-based protocols each shared page has a home to which all writes are propagated and from which all copies are derived. Two home-based protocols are described, implemented and evaluated on two hardware and software platforms: Automatic Update Release Consistency (AURC), which requires hardware support for fine-grained remote writes (automatic updates), and Homebased Lazy Release Consistency (HLRC), which is implemented exclusively in software. The dissertation investigates the ...
Designing Memory Consistency Models for Shared-Memory Multiprocessors
, 1993
"... The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations exploited by uniprocessors. For higher performance, several alternative models have been proposed. However, many of these are hardware-centric in nature and difficult to program. Further, the multitude of many seemingly unrelated memory models inhibits portability. We use a 3P criteria of programmability, portability, and performance to assess memory models, and find current models lacking in one or more of these criteria. This thesis establishes a unifying framework for reasoning about memory models that leads to models that adequately satisfy the 3P criteria. The first contribution of this thesis is a programmer-centric methodology, called sequential consistency normal form (SCNF), for specifying memory models. This methodology is based on the observation that performance enhancing optimizations can be allowed without violating sequential consistency if the system is given some information about the program. An SCNF model is a contract between the system and the programmer, where the system guarantees both high performance and sequential consistency only if the programmer provides certain information about the program. Insufficient information gives lower performance, but incorrect information
Programming for Different Memory Consistency Models
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1992
"... The memory consistency model, or memory model, supported by a shared-memory multiprocessor directly affects its performance. The most commonly assumed memory model is sequential consistency (SC). While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of mem ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
The memory consistency model, or memory model, supported by a shared-memory multiprocessor directly affects its performance. The most commonly assumed memory model is sequential consistency (SC). While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of memory accesses and restricts the use of common hardware and compiler optimizations. To remedy the shortcomings of SC, several relaxed memory models have been proposed in the literature. These include processor consistency (PC), weak ordering (WO), release consistency (RCsc/RCpc), total store ordering (TSO), and partial store ordering (PSO). While the relaxed models provide the potential for higher performance, they present a more complex model for programmers when compared to SC. Our previous research has addressed this tradeoff by taking a programmer-centric approach. We have proposed memory models (DRF0, DRF1, PL) that allow the programmer to reason with SC, but require certain information ...
An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors
, 1997
"... The memory consistency model of a shared-memory multiprocessor determines the extent to which memory operations may be overlapped or reordered for better performance. Studies on previous-generation shared-memory multiprocessors have shown that relaxed memory consistency models like release consisten ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
The memory consistency model of a shared-memory multiprocessor determines the extent to which memory operations may be overlapped or reordered for better performance. Studies on previous-generation shared-memory multiprocessors have shown that relaxed memory consistency models like release consistency (RC) can significantly outperform the conceptually simpler model of sequential consistency (SC). Current and next-generation multiprocessors use commodity microprocessors that aggressively exploit instruction-level parallelism (ILP) using methods such as multiple issue, dynamic scheduling, and non-blocking reads. For such processors, researchers have conjectured that two techniques, hardware-controlled non-binding prefetching and speculative reads, have the potential to equalize the hardware performance of memory consistency models. These techniques have recently begun to appear in commercial microprocessors, and re-open the question of whether the performance benefits of release consiste...

