Results 1 - 10
of
27
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inher ..."
Abstract
-
Cited by 84 (28 self)
- Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where |a |> |b |> 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call one-shot contention; for a large class of switching networks and for conventional counting networks the one-shot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lock-free, lowcontention networked data structure that supports arbitrary fetch&add operations.
Diffracting trees
- In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract
-
Cited by 52 (10 self)
- Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed shared-memory machine and several simulated message-passing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queue-locking in the design of many concurrent data structures.
A scalable lock-free stack algorithm
- In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Linear lower bounds on real-world implementations of concurrent objects
- In Proceedings of the 46th Annual Symposium on Foundations of Computer Science (FOCS
, 2005
"... Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations consid ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations considered may apply any deterministic primitives to a base object. No bounds are assumedon either the number of base objects or their size. Time is measured as the number of steps a process performs on base objects and the number of stalls it incurs as a result of contentionwith other processes. 1
Scalable Concurrent Priority Queue Algorithms
- In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating sy ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing small-scale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
Supporting Increment and Decrement Operations in Balancing Networks
- Proceedings of the 16th International Symposium on Theoretical Aspects of Computer Science
, 1998
"... Counting networks are a class of distributed data structures that support highly concurrent implementations of shared Fetch&Increment counters. Applications of these counters include shared pools and stacks, load balancing, and software barriers [4, 16, 18, 23]. A limitation of counting networks ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
Counting networks are a class of distributed data structures that support highly concurrent implementations of shared Fetch&Increment counters. Applications of these counters include shared pools and stacks, load balancing, and software barriers [4, 16, 18, 23]. A limitation of counting networks is that the resulting shared counters can be incremented, but not decremented.
Counting networks are practically linearizable
- In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing
, 1996
"... Counting networks are a class of concurrent structures that allow the design of highly scalable concurrent data structures in a way that eliminates sequential bottlenecks and contention. Linearizable counting networks assure that the order of the values returned by the network reflects the real-time ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Counting networks are a class of concurrent structures that allow the design of highly scalable concurrent data structures in a way that eliminates sequential bottlenecks and contention. Linearizable counting networks assure that the order of the values returned by the network reflects the real-time order in which they were requested. We argue that in many concurrent systems the worst case scenarios that violate linearizability require a form of timing anomaly that is uncommon in practice. The linear time cost of designing networks that achieve linearizability under all circumstances may thus prove an unnecessary burden on applications that are willing to trade-off occasional non-linearizability for speed and parallelism. This paper presents a very simple measure that is iocal to the individual links and nodes of the network, and that quantifies the extent to which a network can suffer from timing anomalies and still remain linearizable. Perhaps counter-intuitively, this measure is independent of network depth. We use our measure to mathematically support our experiment al results: that in a variety of normal situations tested on a simulated shared memory multiprocessor, the Monic counting networks of Aspnes, Herlihy, and Shavit are “for all practical purposes” Iinearizable.
Combining funnels: a dynamic approach to software combining
- Journal of Parallel and Distributed Computing
, 2000
"... We enhance the well-established software combining synchronization technique to create combining funnels. Previous software combining methods used a statically assigned tree whose depth was logarithmic in the total number of processors in the system. On shared memory multiprocessors the new method a ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We enhance the well-established software combining synchronization technique to create combining funnels. Previous software combining methods used a statically assigned tree whose depth was logarithmic in the total number of processors in the system. On shared memory multiprocessors the new method allows one to dynamically build combining trees with depth logarithmic in the actual number of processors concurrently accessing the data structure. The structure is comprised from a series of combining layers through which processors ' requests are funneled. These layers use randomization instead of a rigid tree structure to allow processors to find partners for combining. By using an adaptive scheme the funnel can change width and depth to accommodate different access frequencies without requiring global agreement as to its size. Rather, processors choose parameters of the protocol privately, making this scheme very simple to implement and tune. When we add an ``elimination' ' mechanism to the funnel structure, the randomly constructed ``tree' ' is transformed into a ``forest' ' of disjoint (and on average
Scalable Synchronous Queues
, 2009
"... In a thread-safe concurrent queue, consumers typically wait for producers to make data available. In a synchronous queue, producers similarly wait for consumers to take the data. We present two new nonblocking, contention-free synchronous queues that achieve high performance through a form of dualis ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In a thread-safe concurrent queue, consumers typically wait for producers to make data available. In a synchronous queue, producers similarly wait for consumers to take the data. We present two new nonblocking, contention-free synchronous queues that achieve high performance through a form of dualism: The underlying data structure may hold both data and, symmetrically, requests. We present performance results on 16-processor SPARC and 4-processor Opteron machines. We compare our algorithms to commonly used alternatives from the literature and from the Java SE 5.0 class java.util.concurrent.SynchronousQueue both directly in synthetic microbenchmarks and indirectly as the core of Java’s ThreadPoolExecutor mechanism. Our new algorithms consistently outperform the Java SE 5.0 SynchronousQueue by factors of three in unfair mode and 14 in fair mode; this translates to factors of two and ten for the ThreadPoolExecutor. Our synchronous queues have been adopted for inclusion in Java 6.
Room Synchronizations
, 2001
"... We present a class of synchronization called room synchronizations and show how this class can be used to implement asynchronous parallel queues and stacks with constant time access (assuming a fetch-and-add operation). The room synchronization problem involves supporting a set of m mutually exclusi ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We present a class of synchronization called room synchronizations and show how this class can be used to implement asynchronous parallel queues and stacks with constant time access (assuming a fetch-and-add operation). The room synchronization problem involves supporting a set of m mutually exclusive "rooms" where any number of users can execute code simultaneously in any one of the rooms, but no two users can simultaneously execute code in separate rooms. Users asynchronously request permission to enter specified rooms, and neither the arrival time nor the arrival order nor the desired room of such requests are known ahead of time. We describe an algorithm for room synchronizations, and prove it satisfies a number of desirable properties. We have implemented our algorithm on a Sun UltraEnterprise 10000 multiprocessor. We present experimental results comparing an implementation of a parallel stack using room synchronizations to one using locks, demonstrating a significant scalability advantage for room synchronizations.

