Results 1  10
of
17
WaitFree Synchronization
 ACM Transactions on Programming Languages and Systems
, 1993
"... A waitfree implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a waitfree implementation of one data object from another lie ..."
Abstract

Cited by 732 (27 self)
 Add to MetaCart
A waitfree implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a waitfree implementation of one data object from another lies at the heart of much recent work in concurrent algorithms, concurrent data structures, and multiprocessor architectures. In the first part of this paper, we introduce a simple and general technique, based on reduction to a consensus protocol, for proving statements of the form "there is no waitfree implementation of X by Y ." We derive a hierarchy of objects such that no object at one level has a waitfree implementation in terms of objects at lower levels. In particular, we show that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy: they cannot be used to construct waitfree implementations of many simple and familiar da...
A methodology for implementing highly concurrent data structures
 In 2nd Symp. Principles & Practice of Parallel Programming
, 1990
"... A con.curren.t object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on criticaI sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchr ..."
Abstract

Cited by 320 (12 self)
 Add to MetaCart
A con.curren.t object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on criticaI sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchronous systems: if one process is halted or delayed in a critical section, other, nonfaulty processes will be unable to progress. By contrast, a concurrent object implementation is nonblocking if it always guarantees that some process will complete an operation in a finite number of steps, and it is waitfree if it guarantees that each process will complete an operation in a finite number of steps. This paper proposes a new methodology for constructing nonblocking aud waitfree implementations of concurrent objects. The object’s representation and operations are written as st,ylized sequential programs, with no explicit synchronization. Each sequential operation is automatically transformed into a nonblocking or waitfree operation usiug novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIM D) architecture in which n processes communicate by applying read, write, and comparekYswa,p operations to a shared memory. 1
The Topological Structure of Asynchronous Computability
 JOURNAL OF THE ACM
, 1996
"... We give necessary and sufficient combinatorial conditions characterizing the tasks that can be solved by asynchronous processes, of which all but one can fail, that communicate by reading and writing a shared memory. We introduce a new formalism for tasks, based on notions from classical algebra ..."
Abstract

Cited by 117 (11 self)
 Add to MetaCart
We give necessary and sufficient combinatorial conditions characterizing the tasks that can be solved by asynchronous processes, of which all but one can fail, that communicate by reading and writing a shared memory. We introduce a new formalism for tasks, based on notions from classical algebraic and combinatorial topology, in which a task's possible input and output values are each associated with highdimensional geometric structures called simplicial complexes. We characterize computability in terms of the topological properties of these complexes. This characterization has a surprising geometric interpretation: a task is solvable if and only if the complex representing the task's allowable inputs can be mapped to the complex representing the task's allowable outputs by a function satisfying certain simple regularity properties. Our formalism thus replaces the "operational" notion of a waitfree decision task, expressed in terms of interleaved computations unfolding ...
Efficient Synchronization on Multiprocessors with Shared Memory
 ACM Transactions on Programming Languages and Systems
, 1986
"... A new formalism is given for readmodifywrite (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requi ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
A new formalism is given for readmodifywrite (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requirements for the practicality of combining are discussed. Combining is shown to be practical for many useful memory access operations. This includes memory updates of the form mem_val := mem_val op val, where op need not be associative, and a variety of synchronization primitives. The computation involved is shown to be closely related to parallel prefix evaluation. 1. INTRODUCTION Shared memory provides convenient communication between processes in a tightly coupled multiprocessing system. Shared variables can be used for data sharing, information transfer between processes, and, in particular, for coordination and synchronization. Constructs such as the semaphore introduced by Dijkstra in ...
Contention in Shared Memory Algorithms
, 1993
"... Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at t ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at the same time. Nevertheless, even though contention is one of the principal considerations affecting the performance of real algorithms on real multiprocessors, there are no formal tools for analyzing the contention of asynchronous sharedmemory algorithms. This paper introduces the first formal complexity model for contention in multiprocessors. We focus on the standard multiprocessor architecture in which n asynchronous processes communicate by applying read, write, and readmodifywrite operations to a shared memory. We use our model to derive two kinds of results: (1) lower bounds on contention for well known basic problems such as agreement and mutual exclusion, and (2) tradeoffs betwe...
Diffracting trees
 In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract

Cited by 57 (12 self)
 Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed sharedmemory machine and several simulated messagepassing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queuelocking in the design of many concurrent data structures.
Counting Networks and MultiProcessor Coordination (Extended Abstract)
 In Proceedings of the 23rd Annual Symposium on Theory of Computing
, 1991
"... ) James Aspnes Maurice Herlihy y Nir Shavit z Digital Equipment Corporation Cambridge Research Lab CRL 90/11 September 18, 1991 Abstract Many fundamental multiprocessor coordination problems can be expressed as counting problems: processes must cooperate to assign successive values from a g ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
) James Aspnes Maurice Herlihy y Nir Shavit z Digital Equipment Corporation Cambridge Research Lab CRL 90/11 September 18, 1991 Abstract Many fundamental multiprocessor coordination problems can be expressed as counting problems: processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention. Motivated by observations on the behavior of sorting networks, we offer a completely new approach to solving such problems. We introduce a new class of networks called counting networks, i.e., networks that can be used to count. We give a counting network construction of depth log 2 n using n log 2 n "gates," avoiding the sequential bottlenecks inherent to former solutions, and having a provably lower contention factor on its gates. Finally, to show that counting networks are not ...
Scalable Concurrent Counting
 ACM Transactions on Computer Systems
, 1995
"... The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. In this paper, we investigate the scalability of a variety of counting techniques for largescale multiprocessor ..."
Abstract

Cited by 24 (11 self)
 Add to MetaCart
The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. In this paper, we investigate the scalability of a variety of counting techniques for largescale multiprocessors. We compare counting techniques based on: (1) spin locks, (2) message passing, (3) distributed queues, (4) software combining trees, and (5) counting networks. Our comparison is based on a series of simple benchmarks on a simulated 64processor Alewife machine, a distributedmemory multiprocessor currently under development at MIT. Although locking techniques are known to perform well on smallscale, busbased multiprocessors, serialization limits performance and contention can degrade performance. Both counting networks and combining trees substantially outperform the other methods by avoiding serialization and alleviating contention, although combining tree throughput is more sensitive t...
The Decidability of Distributed Decision Tasks
 In Proceedings of the twentyninth annual ACM symposium on Theory of computing
, 1997
"... ) Maurice Herlihy Computer Science Department Brown University, Providence RI 02912 herlihy@cs.brown.edu Sergio Rajsbaum y Instituto de Matem'aticas U.N.A.M., D.F. 04510, M'exico rajsbaum@servidor.unam.mx Abstract A task is a distributed coordination problem in which each process starts w ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
) Maurice Herlihy Computer Science Department Brown University, Providence RI 02912 herlihy@cs.brown.edu Sergio Rajsbaum y Instituto de Matem'aticas U.N.A.M., D.F. 04510, M'exico rajsbaum@servidor.unam.mx Abstract A task is a distributed coordination problem in which each process starts with a private input value taken from a finite set, communicates with the other processes by applying operations to shared objects, and eventually halts with a private output value, also taken from a finite set. A protocol is a distributed program that solves a task. A protocol is tresilient if it tolerates failures by t or fewer processes. A task is solvable in a given model of computation if it has a tresilient protocol in that model. A set of tasks is decidable in a given model of computation if there exists an effective procedure for deciding whether any task in that set has a tresilient protocol. This paper gives the first necessary and sufficient conditions for task decidability in ...
Simulation of PRAM Models on Meshes
 Nordic Journal on Computing, 2(1):51
, 1994
"... We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \The ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \Theta 3 p N \Theta 3 p N ) mesh is possible with O( p N ) (respectively O( 3 p N )) delay with high probability and a relatively small constant. Furthermore, with more sophisticated simulations further speedups are achieved; experiments show delays as low as p N + o( p N ) (respectively 3 p N + o( 3 p N )) per N PRAM processors. These simulations compare quite favorably with PRAM simulations on butterfly and hypercube. 1 Introduction PRAM 1 (Parallel Random Access Machine) is an abstract model of computation. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of a PRAM is often seen to consist of...