Results 1  10
of
34
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inher ..."
Abstract

Cited by 106 (31 self)
 Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where a > b > 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call oneshot contention; for a large class of switching networks and for conventional counting networks the oneshot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lockfree, lowcontention networked data structure that supports arbitrary fetch&add operations.
Diffracting trees
 In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract

Cited by 58 (11 self)
 Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed sharedmemory machine and several simulated messagepassing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queuelocking in the design of many concurrent data structures.
A scalable lockfree stack algorithm
 In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract

Cited by 56 (9 self)
 Add to MetaCart
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lockfree linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is nonblocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lockfree stack is lockfree, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lockbased and nonblocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Scalable Synchronous Queues
, 2009
"... In a threadsafe concurrent queue, consumers typically wait for producers to make data available. In a synchronous queue, producers similarly wait for consumers to take the data. We present two new nonblocking, contentionfree synchronous queues that achieve high performance through a form of dualis ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
In a threadsafe concurrent queue, consumers typically wait for producers to make data available. In a synchronous queue, producers similarly wait for consumers to take the data. We present two new nonblocking, contentionfree synchronous queues that achieve high performance through a form of dualism: The underlying data structure may hold both data and, symmetrically, requests. We present performance results on 16processor SPARC and 4processor Opteron machines. We compare our algorithms to commonly used alternatives from the literature and from the Java SE 5.0 class java.util.concurrent.SynchronousQueue both directly in synthetic microbenchmarks and indirectly as the core of Java’s ThreadPoolExecutor mechanism. Our new algorithms consistently outperform the Java SE 5.0 SynchronousQueue by factors of three in unfair mode and 14 in fair mode; this translates to factors of two and ten for the ThreadPoolExecutor. Our synchronous queues have been adopted for inclusion in Java 6.
A Steady State Analysis of Diffracting Trees
, 1997
"... Diffracting trees are an effective and highly scalable distributedparallel technique for shared counting and load balancing. This paper presents the first steadystate combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our mo ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Diffracting trees are an effective and highly scalable distributedparallel technique for shared counting and load balancing. This paper presents the first steadystate combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our model is simple and sufficiently high level to overcome many implementation specific details, and yet as we will show it is rich enough to accurately predict empirically observed behaviors. As a result of our analysis we were able to identify starvation problems in the original diffracting tree algorithm and modify it to a create a more stable version. We are also able to identify the range in which the diffracting tree performs most efficiently, and the ranges in which its performance degrades. We believe our model and modeling approach openthewayto steadystate analysis of other distributedparallel structures such as counting networks and elimination trees.
Linear lower bounds on realworld implementations of concurrent objects
 In Proceedings of the 46th Annual Symposium on Foundations of Computer Science (FOCS
, 2005
"... Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations consid ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations considered may apply any deterministic primitives to a base object. No bounds are assumedon either the number of base objects or their size. Time is measured as the number of steps a process performs on base objects and the number of stalls it incurs as a result of contentionwith other processes. 1
Scalable Concurrent Priority Queue Algorithms
 In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating sy ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing smallscale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
Combining funnels: a dynamic approach to software combining
 Journal of Parallel and Distributed Computing
, 2000
"... We enhance the wellestablished software combining synchronization technique to create combining funnels. Previous software combining methods used a statically assigned tree whose depth was logarithmic in the total number of processors in the system. On shared memory multiprocessors the new method a ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We enhance the wellestablished software combining synchronization technique to create combining funnels. Previous software combining methods used a statically assigned tree whose depth was logarithmic in the total number of processors in the system. On shared memory multiprocessors the new method allows one to dynamically build combining trees with depth logarithmic in the actual number of processors concurrently accessing the data structure. The structure is comprised from a series of combining layers through which processors ' requests are funneled. These layers use randomization instead of a rigid tree structure to allow processors to find partners for combining. By using an adaptive scheme the funnel can change width and depth to accommodate different access frequencies without requiring global agreement as to its size. Rather, processors choose parameters of the protocol privately, making this scheme very simple to implement and tune. When we add an ``elimination' ' mechanism to the funnel structure, the randomly constructed ``tree' ' is transformed into a ``forest' ' of disjoint (and on average
Supporting Increment and Decrement Operations in Balancing Networks
 Proceedings of the 16th International Symposium on Theoretical Aspects of Computer Science
, 1998
"... Counting networks are a class of distributed data structures that support highly concurrent implementations of shared Fetch&Increment counters. Applications of these counters include shared pools and stacks, load balancing, and software barriers [4, 16, 18, 23]. A limitation of counting networks ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
Counting networks are a class of distributed data structures that support highly concurrent implementations of shared Fetch&Increment counters. Applications of these counters include shared pools and stacks, load balancing, and software barriers [4, 16, 18, 23]. A limitation of counting networks is that the resulting shared counters can be incremented, but not decremented.
Counting networks are practically linearizable
 In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing
, 1996
"... Counting networks are a class of concurrent structures that allow the design of highly scalable concurrent data structures in a way that eliminates sequential bottlenecks and contention. Linearizable counting networks assure that the order of the values returned by the network reflects the realtime ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Counting networks are a class of concurrent structures that allow the design of highly scalable concurrent data structures in a way that eliminates sequential bottlenecks and contention. Linearizable counting networks assure that the order of the values returned by the network reflects the realtime order in which they were requested. We argue that in many concurrent systems the worst case scenarios that violate linearizability require a form of timing anomaly that is uncommon in practice. The linear time cost of designing networks that achieve linearizability under all circumstances may thus prove an unnecessary burden on applications that are willing to tradeoff occasional nonlinearizability for speed and parallelism. This paper presents a very simple measure that is iocal to the individual links and nodes of the network, and that quantifies the extent to which a network can suffer from timing anomalies and still remain linearizable. Perhaps counterintuitively, this measure is independent of network depth. We use our measure to mathematically support our experiment al results: that in a variety of normal situations tested on a simulated shared memory multiprocessor, the Monic counting networks of Aspnes, Herlihy, and Shavit are “for all practical purposes” Iinearizable.