Results 1 -
6 of
6
Weak Ordering -- A New Definition
, 1990
"... A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater perfor ..."
Abstract
-
Cited by 213 (12 self)
- Add to MetaCart
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an imple-mentation for cache-coherent systems chat is not allowed by the old definition.
Designing Memory Consistency Models for Shared-Memory Multiprocessors
, 1993
"... The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations exploited by uniprocessors. For higher performance, several alternative models have been proposed. However, many of these are hardware-centric in nature and difficult to program. Further, the multitude of many seemingly unrelated memory models inhibits portability. We use a 3P criteria of programmability, portability, and performance to assess memory models, and find current models lacking in one or more of these criteria. This thesis establishes a unifying framework for reasoning about memory models that leads to models that adequately satisfy the 3P criteria. The first contribution of this thesis is a programmer-centric methodology, called sequential consistency normal form (SCNF), for specifying memory models. This methodology is based on the observation that performance enhancing optimizations can be allowed without violating sequential consistency if the system is given some information about the program. An SCNF model is a contract between the system and the programmer, where the system guarantees both high performance and sequential consistency only if the programmer provides certain information about the program. Insufficient information gives lower performance, but incorrect information
Programming for Different Memory Consistency Models
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1992
"... The memory consistency model, or memory model, supported by a shared-memory multiprocessor directly affects its performance. The most commonly assumed memory model is sequential consistency (SC). While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of mem ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
The memory consistency model, or memory model, supported by a shared-memory multiprocessor directly affects its performance. The most commonly assumed memory model is sequential consistency (SC). While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of memory accesses and restricts the use of common hardware and compiler optimizations. To remedy the shortcomings of SC, several relaxed memory models have been proposed in the literature. These include processor consistency (PC), weak ordering (WO), release consistency (RCsc/RCpc), total store ordering (TSO), and partial store ordering (PSO). While the relaxed models provide the potential for higher performance, they present a more complex model for programmers when compared to SC. Our previous research has addressed this tradeoff by taking a programmer-centric approach. We have proposed memory models (DRF0, DRF1, PL) that allow the programmer to reason with SC, but require certain information ...
Implementing Sequential Consistency In Cache-Based Systems
- In Proceedings of the 1990 International Conference on Parallel Processing
, 1990
"... A model for shared-memory systems commonly (and often implicitly) assumed by programmers is that of sequential consistency. For implementing sequential consistency in a cache-based system, it is widely believed that (1) implementing strong ordering is sufficient and (2) restricting a processor to o ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
A model for shared-memory systems commonly (and often implicitly) assumed by programmers is that of sequential consistency. For implementing sequential consistency in a cache-based system, it is widely believed that (1) implementing strong ordering is sufficient and (2) restricting a processor to one sharedmemory reference at a time is practically necessary. In this paper we show that both beliefs are false. First, we prove that (1) is false with a counter-example. Second, we argue that (2) is false by giving sufficient conditions and an implementation that allow a processor to have simultaneous incomplete shared-memory references. While we do not demonstrate that this implementation is superior, we do believe it is practical and worthy of consideration. Keywords: shared-memory multiprocessors, sequential consistency, strong ordering, cache coherence. 1. Introduction A model of memory for shared-memory MIMD multiprocessor systems commonly (and often implicitly) assumed by programm...
PSCR: a coherence protocol for eliminating passive sharing in shared-bus shared-memory multiprocessors
- IEEE Transactions on Parallel and Distributed Systems
, 1999
"... AbstractÐIn high-performance general-purpose workstations and servers, the workload can be typically constituted of both sequential and parallel applications. Shared-bus shared-memory multiprocessor can be used to speed-up the execution of such workload. In this environment, the scheduler takes care ..."
Abstract
-
Cited by 15 (13 self)
- Add to MetaCart
AbstractÐIn high-performance general-purpose workstations and servers, the workload can be typically constituted of both sequential and parallel applications. Shared-bus shared-memory multiprocessor can be used to speed-up the execution of such workload. In this environment, the scheduler takes care of the load balancing by allocating a ready process on the first available processor, thus producing process migration. Process migration and the persistence of private data into different caches produce an undesired sharing, named passive sharing. The copies due to passive sharing produce useless coherence traffic on the bus and coping with such a problem may represent a challenging design problem for these machines. Many protocols use smart solutions to limit the overhead to maintain coherence among shared copies. None of these studies treats passive-sharing directly, although some indirect effect is present while dealing with the other kinds of sharing. Affinity scheduling can alleviate this problem, but this technique does not adapt to all load conditions, especially when the effects of migration are massive. We present a simple coherence protocol that eliminates passive sharing using information from the compiler that is normally available in operating system kernels. We evaluate the performance of this protocol and compare it against other solutions proposed in the literature by means of enhanced trace-driven simulation. We evaluate the complexity in terms of the number of protocol states, additional bus lines, and required software support. Our protocol further limits the coherence-maintaining overhead by using information about access patterns to shared data exhibited in parallel applications. Index TermsÐCache memory, coherence protocol, multiprocessor, performance evaluation. 1
Weak Ordering - A New Definition And Some Implications
, 1989
"... This paper is primarily concerned with the programmer's model of a shared memory system and its implications on hardware design and performance. A model for correct behavior of programs commonly (and often implicitly) assumed by programmers is that of sequential consistency, formally defined by Lamp ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper is primarily concerned with the programmer's model of a shared memory system and its implications on hardware design and performance. A model for correct behavior of programs commonly (and often implicitly) assumed by programmers is that of sequential consistency, formally defined by Lamport [Lam79] as follows: [A system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. While the definition leaves the specific interpretation of the term operations undefined, for a shared memory system, it is usually assumed to refer to memory operations or accesses (e.g., reads and writes). Thus, stated simply for a shared memory system, the above definition translates into the following two conditions - (1) all memory accesses appear to execute atomically in some total order, and (2) all memory accesses of a single process appear to execute in program order. Uniprocessor systems offer the model of sequential consistency almost naturally and without much compromise in performance. In the simplest of architectures, where a processor is allowed to issue a memory access only after the previous access in program order is complete, a total order of memory accesses can be obtained based on the wall-clock time of their issue or execution. More sophisticated architectures allow overlap of instruction execution, out-of-order memory accesses, write buffers, caches (which may be lock-up free [Kro81]), etc. In these machines, an ordering of memory accesses based on wall-clock time of issue or execution may violate program order, but interlock logic assures that accesses appear...

