Results 1 -
5 of
5
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors
, 1981
"... In this paper we implement several basic operating system primitives by using a "replace-add" operation, which can supersede the standard "test and set", and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors. We also pr ..."
Abstract
-
Cited by 84 (2 self)
- Add to MetaCart
In this paper we implement several basic operating system primitives by using a "replace-add" operation, which can supersede the standard "test and set", and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors. We also present a hardware implementation of replace-add that permits multiple replace-adds to be processed nearly as efficiently as loads and stores. Moreover, the crucial special case of concurrent replace-adds updating the same variable is handled particularly well: If every PE simultaneously addresses a replace-add at the same variable, all these requests are satisfied in the time required to process just one request.
Efficient Synchronization on Multiprocessors with Shared Memory
- ACM Transactions on Programming Languages and Systems
, 1986
"... A new formalism is given for read-modify-write (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requi ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
A new formalism is given for read-modify-write (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requirements for the practicality of combining are discussed. Combining is shown to be practical for many useful memory access operations. This includes memory updates of the form mem_val := mem_val op val, where op need not be associative, and a variety of synchronization primitives. The computation involved is shown to be closely related to parallel prefix evaluation. 1. INTRODUCTION Shared memory provides convenient communication between processes in a tightly coupled multiprocessing system. Shared variables can be used for data sharing, information transfer between processes, and, in particular, for coordination and synchronization. Constructs such as the semaphore introduced by Dijkstra in ...
Mechanisms for Efficient Shared-Memory, Lock-Based Synchronization
- PhD thesis,University of Wisconsin,Madison,1999
, 1999
"... Efficient locking synchronization primitives are essential for achieving high performance in fine-grain, shared-memory parallel programs. One function of locking primitives is to enable exclusive access to shared data and critical sections of code. In this dissertation, I make the following six cont ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Efficient locking synchronization primitives are essential for achieving high performance in fine-grain, shared-memory parallel programs. One function of locking primitives is to enable exclusive access to shared data and critical sections of code. In this dissertation, I make the following six contributions. (1) I propose a framework, the synchronization period, in which to reason about the inefficiencies of locking primitives. (2) I identify four previously proposed locking mechanisms (local spinning, queue-based locking, collocation, and synchronous prefetch) and uses them to classify existing locking primitives according to which of these mechanisms they incorporate. (3) With detailed simulations, I show the extent to which these four mechanisms can improve the performance of sharedmemory programs. I evaluate the space of these mechanisms using sixteen synchronization constructs, which are formed from six base types of locks (test&set, test&test&set, MCS, LH, M, and QOLB). I show t...
An Effective Synchronization Network for Hot-spot Accesses
, 1992
"... this paper was presented at the 1991 International Parallel Processing Symposium, Anaheim CA, under the title "An Effective Synchronization Network for Large Multiprocessor Systems". -2 Ultracomputer project [GGKM83], for combining fetch-and-op instructions. Some tree-structured hardware has also b ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper was presented at the 1991 International Parallel Processing Symposium, Anaheim CA, under the title "An Effective Synchronization Network for Large Multiprocessor Systems". -2 Ultracomputer project [GGKM83], for combining fetch-and-op instructions. Some tree-structured hardware has also been proposed for combining hot-spot accesses. Special hardware of this type tends to be too rigid and difficult to use in an environment with process migration or multiprogramming, or when only some of the processors are involved in the synchronization. Lipovski and Vaughan's fetch-and-op tree [LiVa88], for example, is only capable of combining simultaneous accesses to a single hot-spot. It is more suitable for SIMD machines. Other hardware schemes have been proposed for barrier synchronization, such as [BePo90] and [HwSh91], but they are not flexible enough to handle traffic to memory hot-spots. Feedback has been proposed by Scott and Sohi [ScSo89] for avoiding tree saturation, but it does not improve the latency of hot-spot accesses and is not a substitute for combining. Similarly, the intelligent allocation of hardware switch buffers (for example, [Tzen91]) relieves congestion, but does not address the problem of high-latency hot-spot accesses. Some software techniques have been proposed for removing hot-spots (for example, [YeTL87] and [Broo86]). However, software schemes incur a fair amount of overhead. A basic OS operation often requires several hot-spot accesses, and software techniques may not provide the necessary speed. Also, certain compiler optimization techniques (see, for example, cycle shrinking in [Poly88]) are only effective if fast synchronization primitives are available. [MeSc91] introduced efficient algorithms for synchronization, but these are not appli...
A Bounded First-In, First-Enabled Solution to the l-Exclusion Problem
- ACM Transactions on Programming Languages and Systems
, 1990
"... This paper presents a solution to the first-come, first-enabled `-exclusion problem of [?]. Unlike the solution in [?], this solution does not use powerful read-modify-write synchronization primitives, and requires only bounded shared memory. Use of the concurrent timestamp system of [?] is key in s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a solution to the first-come, first-enabled `-exclusion problem of [?]. Unlike the solution in [?], this solution does not use powerful read-modify-write synchronization primitives, and requires only bounded shared memory. Use of the concurrent timestamp system of [?] is key in solving the problem within bounded shared memory. Categories and Subject Descriptors: D.4.1 [Operating Systems]: Process Management---Mutual

