Results 1  10
of
14
Approximate SharedMemory Counting Despite a Strong Adversary
"... A new randomized asynchronous sharedmemory data structure is given for implementing an approximate counter that can be incremented up to n times. For any fixed ɛ, the counter achieves a relative error of δ with high probability, at the cost of O(((1/δ) log n) O(1/ɛ) ) register operations per increm ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
A new randomized asynchronous sharedmemory data structure is given for implementing an approximate counter that can be incremented up to n times. For any fixed ɛ, the counter achieves a relative error of δ with high probability, at the cost of O(((1/δ) log n) O(1/ɛ) ) register operations per increment and O(n 4/5+ɛ ((1/δ) log n) O(1/ɛ) ) register operations per read. The counter combines randomized sampling for estimating large values with an expander for estimating small values. This is the first sublinear solution to this problem that works despite a strong adversary scheduler that can observe internal states of processes. An application of the improved counter is an improved protocol for solving randomized sharedmemory consensus, which reduces the best previously known individual work complexity from O(n log n) to an optimal O(n), resolving one of the last remaining open problems concerning consensus in this model. 1
The Complexity of Renaming
"... We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, whe ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. This bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic fetchandincrement registers, queues and stacks. The proof of the bound is interesting in its own right, for it relies on the first reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement our individual bound with a global lower bound of Ω(k log(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c ≥ 1. This applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter and fetchandincrement implementations, all tight within logarithmic factors. 1
Synchronizing without locks is inherently expensive
 In Proceedings of the ACM Symposium on Principles of Distributed Computing
, 2006
"... It has been politically correct to blame locks for their fragility, especially since researchers identified obstructionfreedom: a progress condition that precludes locking while being weak enough to raise the hope for good performance. This paper attenuates this hope by establishing lower bounds on ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
It has been politically correct to blame locks for their fragility, especially since researchers identified obstructionfreedom: a progress condition that precludes locking while being weak enough to raise the hope for good performance. This paper attenuates this hope by establishing lower bounds on the complexity of obstructionfree implementations in contentionfree executions: those where obstructionfreedom was precisely claimed to be effective. Through our lower bounds, we argue for an inherent cost of concurrent computing without locks. We first prove that obstructionfree implementations of a large class of objects, using only overwriting or trivial primitives in contentionfree executions, have Ω(n) space complexity and Ω(log 2 n) (obstructionfree) step complexity. These bounds apply to implementations of many popular objects, including variants of fetch&add, counter, compare&swap, and LL/SC. When arbitrary primitives can be applied in contentionfree executions, we show that, in any implementation of binary consensus, or any perturbable object, the number of distinct base objects accessed and memory stalls incurred by some process in a contention free execution is Ω ( √ n). All these results hold regardless of the behavior of processes after they become aware of contention. We also prove that, in any obstructionfree implementation of a perturbable object in which processes are not allowed to fail their operations, the number of memory stalls incurred by some process that is unaware of contention is Ω(n).
Polylogarithmic Concurrent Data Structures from Monotone Circuits
, 2010
"... A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader r ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader registers at a cost of at most ⌈lg m ⌉ atomic register operations per write or read. An unbounded max register is constructed with cost O(min(log v, n)) to read or write a value v, where n is the number of processes. It is also shown how a max register can be used to transform any monotone circuit into a waitfree concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. The cost of a write is bounded by O(Sd min(⌈lg m⌉, n), where m is the size of the alphabet for the circuit, S is the number of gates whose value changes as the result of the write, and d is the number of inputs to each gate; the cost of a read is min(⌈lg m⌉, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency condition. As an application, we obtain a simple, linearizable, waitfree counter implementation with a cost of O(min(log n log v, n)) to perform an increment and O(min(log v, n)) to perform a read, where v is the current value of the counter. For polynomiallymany
P.: The Complexity of ObstructionFree Implementations
 J. ACM
, 2009
"... Obstructionfree implementations of concurrent objects are optimized for the common case where there is no step contention, and were recently advocated as an solution to the costs associated with synchronization without locks. In this paper, we study this claim and this goes through precisely defini ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Obstructionfree implementations of concurrent objects are optimized for the common case where there is no step contention, and were recently advocated as an solution to the costs associated with synchronization without locks. In this paper, we study this claim and this goes through precisely defining the notions of obstructionfreedom and step contention. We consider several classes of obstructionfree implementations, present corresponding generic object implementations, and prove lower bounds on their complexity. Viewed collectively, our results establish that the worstcase operation time complexity of obstructionfree implementations is high, even in the absence of step contention. We also show that lockbased implementations are not subject to some of the timecomplexity lower bounds we present.
Max Registers, Counters, and Monotone Circuits
, 2009
"... A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register can be constructed from onebit multiwriter multireade ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register can be constructed from onebit multiwriter multireader registers at a cost of at most ⌈lg m ⌉ atomic register operations per write or read. The construction takes the form of a binary search tree: applying classic techniques for building unbalanced search trees gives an unbounded max register with cost O(min(log v, n)) to read or write a value v, where n is the number of processes. It is also shown how a max register can be used to transform any monotone circuit into a waitfree concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. The cost of a write is bounded by O(Sd min(⌈lg m⌉, n), where m is the size of the alphabet for the circuit, S is the number of gates whose value changes as the result of the write, and d is the number of inputs to each gate; the cost of a read is min(⌈lg m⌉, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency
Boundedwait combining: Constructing robust and highthroughput shared objects
 In Proceedings of the 20th International Symposium on Distributed Computing (DISC’06
, 2006
"... Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably achieve high throughput in ksynchronous executions – executions in which process speeds vary by at most a constant factor k. The algorithm is based on a novel variation of the software combining paradigm that we call boundedwait combining. It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as a stack or a queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the boundedwait combining algorithm, a process only waits for other processes for a bounded period of time and then ‘takes destiny in its own hands’. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects, which we believe is interesting in its own right. We use this metric to prove that our algorithm achieves throughput of Ω(N / log N) in ksynchronous executions, where N is the number of processes that can participate in the algorithm. Our algorithm uses two tools that we believe may prove useful for obtaining highly parallel nonblocking implementation of additional objects. The first are “synchronous locks”, locks that are respected by processes only in ksynchronous executions and are disregarded otherwise; the second are “pseduotransactions ” a weakening of regular transactions that allows higher parallelism.
SoloValency and the Cost of Coordination
, 2007
"... This paper introduces solovalency, a variation on the valency proof technique originated by Fischer, Lynch, and Paterson. The new technique focuses on critical events that influence the responses of solo runs by individual operations, rather than on critical events that influence a protocol’s singl ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper introduces solovalency, a variation on the valency proof technique originated by Fischer, Lynch, and Paterson. The new technique focuses on critical events that influence the responses of solo runs by individual operations, rather than on critical events that influence a protocol’s single decision value. It allows us to derive √ n lower bounds on the time to perform an operation for lockfree implementations of concurrent objects such as linearizable queues, stacks, sets, hash tables, counters, approximate agreement, and more. Time is measured as the number of distinct base objects accessed and the number of stalls caused by contention in accessing memory, incurred by a process as it performs a single operation. We introduce the influence level metric that quantifies the extent to which the response of a solo execution of one process can be changed by other processes. We then prove the existence of a relationship between the space complexity, latency, contention and influence level of all lockfree object implementations. Our results are broad in that they hold for implementations that may use any collection of readmodifywrite operations in addition to read and write, and in that they apply even if base objects have unbounded size. 1
Constructing Shared Objects that are Both Robust and HighThroughput
"... Abstract. Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably achieve high throughput in semisynchronous executions. The algorithm is based on a novel variation of the software combining paradigm that we call boundedwait combining. It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as stack or queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the boundedwait combining algorithm a process only waits for other processes for a bounded period of time and then ‘takes destiny in its own hands’. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects which we believe is interesting in its own right. We use this metric to prove that our algorithm can achieve throughput of Ω(N / log N) in executions where process speeds vary only by a constant factor, where N is the number of processes that can participate in the algorithm. We also introduce and use pseduotransactions a technique for concurrent execution that may prove useful for other algorithms. 1
Lower Bounds for RestrictedUse Objects
, 2013
"... Concurrent objects play a key role in the design of applications for multicore architectures, making it imperative to precisely understand their complexity requirements. For some objects, it is known that implementations can be significantly more efficient when their usage is restricted. However, a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Concurrent objects play a key role in the design of applications for multicore architectures, making it imperative to precisely understand their complexity requirements. For some objects, it is known that implementations can be significantly more efficient when their usage is restricted. However, apart from the specific restriction of oneshot implementations, where each process may apply only a single operation to the object, very little is known about the complexities of objects under general restrictions. This paper draws a more complete picture by defining a large class of objects for which an operation applied to the object can be “perturbed ” L consecutive times, and by proving lower bounds on their space complexity and on the time complexity of deterministic implementations of such objects. This class includes boundedvalue max registers, limiteduse approximate and exact counters, and limiteduse collect and compareandswap objects; L depends on the number of times the object can be accessed or the maximum value it can support. For nprocess implementations that use only historyless primitives, we prove Ω(min(L, n)) space complexity lower bounds, which hold for both deterministic and randomized implementations. For deterministic implementations, we prove lower bounds of Ω(min(log L, n)) on the worstcase step complexity of an operation. When arbitrary primitives can be used, we prove that either some operation incurs Ω(min(log L, n)) memory stalls or some operation performs Ω(min(log L, n)) steps. In addition to our deterministic time lower bounds, the paper establishes lower bounds on the expected step complexity of restricteduse randomized versions of many of these objects in a weak oblivious adversary model. 1