Results 1  10
of
28
A scalable lockfree stack algorithm
 In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract

Cited by 76 (11 self)
 Add to MetaCart
(Show Context)
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lockfree linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is nonblocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lockfree stack is lockfree, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lockbased and nonblocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Algorithms adaptive to point contention
 Journal of the ACM
, 2003
"... Abstract. This article introduces the sieve, a novel building block that allows to adapt to the number of simultaneously active processes (the point contention) during the execution of an operation. We present an implementation of the sieve in which each sieve operation requires O(k log k) steps, wh ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
Abstract. This article introduces the sieve, a novel building block that allows to adapt to the number of simultaneously active processes (the point contention) during the execution of an operation. We present an implementation of the sieve in which each sieve operation requires O(k log k) steps, where k is the point contention during the operation. The sieve is the cornerstone of the first waitfree algorithms that adapt to point contention using only read and write operations. Specifically, we present efficient algorithms for longlived renaming, timestamping and collecting information.
The Complexity of Renaming
"... We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, whe ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
(Show Context)
We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. This bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic fetchandincrement registers, queues and stacks. The proof of the bound is interesting in its own right, for it relies on the first reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement our individual bound with a global lower bound of Ω(k log(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c ≥ 1. This applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter and fetchandincrement implementations, all tight within logarithmic factors. 1
WaitFree Queues With Multiple Enqueuers and Dequeuers ∗
"... The queue data structure is fundamental and ubiquitous. Lockfree versions of the queue are well known. However, an important open question is whether practical waitfree queues exist. Until now, only versions with limited concurrency were proposed. In this paper we provide a design for a practical w ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
The queue data structure is fundamental and ubiquitous. Lockfree versions of the queue are well known. However, an important open question is whether practical waitfree queues exist. Until now, only versions with limited concurrency were proposed. In this paper we provide a design for a practical waitfree queue. Our construction is based on the highly efficient lockfree queue of Michael and Scott. To achieve waitfreedom, we employ a prioritybased helping scheme in which faster threads help the slower peers to complete their pending operations. We have implemented our scheme on multicore machines and present performance measurements comparing our implementation with that of Michael and Scott in several system configurations.
Fast Randomized TestandSet and Renaming
"... Most people believe that renaming is easy: simply choose a name at random; if more than one process selects the same name, then try again. We highlight the issues that occur when trying to implement such a scheme and shed new light on the readwrite complexity of randomized renaming in an asynchron ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Most people believe that renaming is easy: simply choose a name at random; if more than one process selects the same name, then try again. We highlight the issues that occur when trying to implement such a scheme and shed new light on the readwrite complexity of randomized renaming in an asynchronous environment. At the heart of our new perspective stands an adaptive implementation of a randomized testandset object, that has polylogarithmic step complexity per operation, with high probability. Interestingly, our implementation is anonymous, as it does not require process identifiers. Based on this implementation, we present two new randomized renaming algorithms. The first ensures a tight namespace of n names using O(n log 4 n) total steps, with high probability. This significantly improves on the complexity of the best previously known namespaceoptimal algorithms. The second algorithm achieves a namespace of size k(1 + ɛ) using O(k log 4 k / log 2 (1 + ɛ)) total steps, both with high probability, where k is the total contention in the execution. It is the first adaptive randomized renaming algorithm, and it improves on existing deterministic solutions by providing a smaller namespace, and by lowering step complexity.
A Simple Algorithmic Characterization of Uniform Solvability (Extended Abstract)
 Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2002
, 2002
"... The HerlihyShavit (HS) conditions characterizing the solvability of asynchronous tasks over n processors have been a milestone in the development of the theory of distributed computing. Yet, they were of no help when researcher sought algorithms that do not depend on n. To help in this pursuit we i ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
The HerlihyShavit (HS) conditions characterizing the solvability of asynchronous tasks over n processors have been a milestone in the development of the theory of distributed computing. Yet, they were of no help when researcher sought algorithms that do not depend on n. To help in this pursuit we investigate the uniform solvability of an infinite uniform sequence of tasks T 0 , T 1 , T 2 , ..., where T i is a task over processors p 0 , p 1 , ..., p i , and T i extends T i1 . We say that such a sequence is uniformly solvable if there exit protocols to solve each T i and the protocol for T i extends the protocol for T i1 . This paper establishes that although each T i may be solvable, the uniform sequence is not necessarily uniformly solvable. We show this by proposing a novel uniform sequence of solvable tasks and proving that the sequence is not amenable to a uniform solution. We then extend the HS conditions for a task over n processors, to uniform solvability in a natural way. The technique we use to accomplish this is to generalize the alternative algorithmic proof, by Borowsky and Gafni, of the HS conditions, by showing that the infinite uniform sequence of task of Immediate Snapshots is uniformly solvable. A side benefit of the technique is a widely applicable methodology for the development of uniform protocols.
Polylogarithmic Concurrent Data Structures from Monotone Circuits
, 2010
"... A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader r ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader registers at a cost of at most ⌈lg m ⌉ atomic register operations per write or read. An unbounded max register is constructed with cost O(min(log v, n)) to read or write a value v, where n is the number of processes. It is also shown how a max register can be used to transform any monotone circuit into a waitfree concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. The cost of a write is bounded by O(Sd min(⌈lg m⌉, n), where m is the size of the alphabet for the circuit, S is the number of gates whose value changes as the result of the write, and d is the number of inputs to each gate; the cost of a read is min(⌈lg m⌉, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency condition. As an application, we obtain a simple, linearizable, waitfree counter implementation with a cost of O(min(log n log v, n)) to perform an increment and O(min(log v, n)) to perform a read, where v is the current value of the counter. For polynomiallymany
OptimalTime Adaptive Strong Renaming, with Applications to Counting (Extended Abstract)
 PODC 2011, SAN JOSE USA
, 2011
"... We give two new randomized algorithms for strong renaming, both of which work against an adaptive adversary in asynchronous shared memory. The first uses repeated sampling over a sequence of arrays of decreasing size to assign unique names to each of n processes with step complexity O(log³ n). The s ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
We give two new randomized algorithms for strong renaming, both of which work against an adaptive adversary in asynchronous shared memory. The first uses repeated sampling over a sequence of arrays of decreasing size to assign unique names to each of n processes with step complexity O(log³ n). The second transforms any sorting network into a strong adaptive renaming protocol, with an expected cost equal to the depth of the sorting network. Using an AKS sorting network, this gives a strong adaptive renaming algorithm with step complexity O(log k), where k is the contention in the current execution. We show this to be optimal based on a classic lower bound of Jayanti. We also show that any such strong renaming protocol can be used to build a monotoneconsistent counter with logarithmic step complexity (at the cost of adding a max register) or a linearizable fetchandincrement register (at the cost of increasing the step complexity by a logarithmic factor).
Asynchronous Exclusive Selection
"... The distributed setting of this paper is an asynchronous system consisting of n processes prone to crashes and a number of shared readwrite registers. We consider problems regarding assigning integer values to processes in an exclusive way, in the sense that no integer is assigned to two distinct p ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
The distributed setting of this paper is an asynchronous system consisting of n processes prone to crashes and a number of shared readwrite registers. We consider problems regarding assigning integer values to processes in an exclusive way, in the sense that no integer is assigned to two distinct processes. In the problem of renaming, any k ≤ n processes, that hold original names from a range [N] = {1,..., N}, contend to acquire unique integers as new names in a smaller range [M] using some r shared registers. When k and N are known, our waitfree solution operates in O(log k(log N + log k log log N)) local steps, for M = O(k), and with r = O(k log N) auxiliary shared registers. Processes obtain new k names by exploring their neighbors in bipartite graphs of suitable expansion properties, with nodes representing names and processes competing for the name of each visited node.} local steps are required in the worst case to waitfree solve renaming, when k and N are known and r and M are given constraints. We give a fully adaptive solution, with neither k nor N known, having M = 8k − lg k − 1 as a bound on the range of new names, operating in O(k) steps and using O(n 2) registers. We apply renaming algorithms to obtain solutions to the Store&Collect problem. When both k and N are known, then storing can be performed in O(log k(log N + log k log log N)) steps and collecting in O(k) steps, for r = O(k log(N/k)) registers. We consider the problem UnboundedNaming in which processes repeatedly require new names, while no name can be reused once assigned, so that infinitely many integers need to be exclusively assigned as names. For no fixed integer i can one guarantee in a waitfree manner that i is eventually assigned to be a name, so some integers may never be used; the upper bound on the number of such unused integers is used as a measure of quality of a solution. We show that UnboundedNaming is solvable in a nonblocking way with at most n −1 integers never assigned as names, which is best possible, and in a waitfree manner with at most n(n − 1) values never assigned as names.
Lower bounds for adaptive collect and related objects
 In Proc. 23 Annual ACM Symp. on Principles of Distributed Computing
, 2004
"... An adaptive algorithm, whose step complexity adjusts to the number of active processes, is attractive for situations in which the number of participating processes is highly variable. This paper studies the number and type of multiwriter registers that are needed for adaptive algorithms. We prove th ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
An adaptive algorithm, whose step complexity adjusts to the number of active processes, is attractive for situations in which the number of participating processes is highly variable. This paper studies the number and type of multiwriter registers that are needed for adaptive algorithms. We prove that if a collect algorithm is fadaptive to total contention, namely, its step complexity is f(k), where k is the number of processes that ever tooka step, then it uses Ω(f −1 (n)) multiwriter registers, where n is the total number of processes in the system. Furthermore, we show that competition for the underlying registers is inherent for adaptive collect algorithms. We consider cwrite registers, to which at most c processes can be concurrently about to write. Special attention is given to exclusivewrite registers, the case c = 1 where no competition is allowed, and concurrentwrite registers, the case c = n where any amount of competition is allowed. A collect algorithm is fadaptive to point contention, if its step complexity is f(k), where k is the maximum number of simultaneously active processes. Such an algorithm is shown to require Ω(f −1 ( n c)) concurrentwrite registers, even if an unlimited number of cwrite registers are available. A smaller lower bound is also obtained in this situation for collect algorithms that are fadaptive to total contention. The lower bounds also hold for nondeterministic implementations of sensitive objects from historyless objects. Finally, we present lower bounds on the step complexity in solo executions (i.e., without any contention), when only cwrite registers are used: For weaktest&set objects, we log n present an Ω() lower bound. Our lower bound log c+log log n for collect and sensitive objects is Ω ( n−1 c).