Results 1 -
5 of
5
Weak Ordering -- A New Definition
, 1990
"... A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater perfor ..."
Abstract
-
Cited by 213 (12 self)
- Add to MetaCart
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an imple-mentation for cache-coherent systems chat is not allowed by the old definition.
The Microarchitecture of Superscalar Processors
, 1995
"... Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of superscalar processors. We begin with a discussion of the general problem solved by superscalar processors: converting an ostensibly sequential program into a more parallel one. The principles underlying this process, and the constraints that must be met, are discussed. The paper then provides a description of the specific implementation techniques used in the important phases of superscalar processing. The major phases include: i) instruction fetching and conditional branch processing, ii) the determination of data dependences involving register values, iii) the initiation, or issuing, of instructions for parallel execution, iv) the communication of data values through memory via loads and stores, and v) committing the process state in correct order so that precise interrupts can be supported. Examples of recent superscalar microprocessors, the MIPS R10000, the DEC 21164, and the AMD K5 are used to illustrate a variety of superscalar methods.
RPM: A rapid prototyping engine for multiprocessor systems
- IEEE Computer
, 1995
"... In multiprocessor systems, processing nodes contain a processor, some cache and a share of the system memory, and are connected through a scalable interconnect. The system memory partitions may be shared (shared-memory systems) or disjoint (messagepassing systems). Within each class of systems many ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
In multiprocessor systems, processing nodes contain a processor, some cache and a share of the system memory, and are connected through a scalable interconnect. The system memory partitions may be shared (shared-memory systems) or disjoint (messagepassing systems). Within each class of systems many architectural variations are possible. Fair comparisons among systems are difficult because of the lack of a common hardware platform to implement the different architectures. RPM (Rapid Prototyping engine for Multiprocessors) is a hardware emulator for the rapid prototyping of various multiprocessor architectures. In RPM, the hardware of the target machine is emulated by reprogrammable controllers implemented with Field-Programmable Gate Arrays (FPGAs). The processors, memories and interconnect are off-theshelf and their relative speeds can be modified to emulate various component technologies. Every emulation is an actual incarnation of the target machine and therefore software written for the target machine can be easily ported on it with little modification and without instrumentation of the code. In this paper, we describe the architecture of RPM, its performance and the prototyping methodology. We also compare our approach with simulation and breadboard prototyping. Keywords: Field-Programmable Gate Arrays (FPGAs), message-passing multicomputers, shared-memory multiprocessors, design verification, performance evaluation, simulation.
A Fully Asynchronous Reader/Writer Mechanism for Multiprocessor Real-Time Systems
, 1997
"... Data sharing among tasks within multiprocessor real-time systems is a crucial issue. This report presents a fully asynchronous mechanism of sharing data between a single writer and multiple readers. The writer and all the readers are allowed to access the shared data asynchronously in a loop-free an ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data sharing among tasks within multiprocessor real-time systems is a crucial issue. This report presents a fully asynchronous mechanism of sharing data between a single writer and multiple readers. The writer and all the readers are allowed to access the shared data asynchronously in a loop-free and wait-free manner because neither locking operations nor repeated actions of read-and-check are involved. Its implementation uses only (n + 2) buffer slots for n readers, and employs an atomic `Store-IfZero ' operation which can be easily simulated with the Compare-and-Swap instruction. Since neither writing nor reading the shared data imposes any effect upon other tasks in the system, this mechanism introduces no impact upon the timing behaviour of tasks. When employed by real-time applications, it helps to reduce blocking and priority inversion problems incurred by the commonly used lock-based synchronization mechanisms. 1 Introduction Data sharing is a basic approach to achieving inter...
A Three-Slot Asynchronous Reader/Writer Mechanism for Multiprocessor Real-Time Systems
, 1997
"... This report presents an approach to realizing a three-slot asynchronous reader/writer mechanism in multiprocessor real-time systems. The mechanism allows both the reader and the writer to access the shared data object at any time without employing any conventional synchronization protocol. Its imple ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This report presents an approach to realizing a three-slot asynchronous reader/writer mechanism in multiprocessor real-time systems. The mechanism allows both the reader and the writer to access the shared data object at any time without employing any conventional synchronization protocol. Its implementation takes advantage of the hardware supported Compare-and-Swap instruction to coordinate buffer accessing. Since there is no locking mechanism adopted in the approach, it is a non-blocking mechanism such that delay of either the reader or the writer imposes no effect upon the other. In addition, no repeated reading is needed by the reader. For real-time applications, this mechanism introduces no impact upon timing behaviour of tasks. It helps to reduce blocking and priority inversion problems incurred by the commonly used lock-based synchronization mechanisms. 1 Introduction Data sharing, which is a basic approach to achieving intertask communication within a variety of applications,...

