Results 1  10
of
41
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inher ..."
Abstract

Cited by 106 (31 self)
 Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where a > b > 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call oneshot contention; for a large class of switching networks and for conventional counting networks the oneshot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lockfree, lowcontention networked data structure that supports arbitrary fetch&add operations.
A scalable lockfree stack algorithm
 In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract

Cited by 56 (9 self)
 Add to MetaCart
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are nonblocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lockfree linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is nonblocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lockfree stack is lockfree, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lockbased and nonblocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Elimination Trees and the Construction of Pools and Stacks
, 1996
"... Shared pools and stacks are two coordination structures with a history of applications ranging from simple producer/consumer buffers to jobschedulers and procedure stacks. This paper introduces elimination trees, a novel form of diffracting trees that offer pool and stack implementations with super ..."
Abstract

Cited by 42 (12 self)
 Add to MetaCart
Shared pools and stacks are two coordination structures with a history of applications ranging from simple producer/consumer buffers to jobschedulers and procedure stacks. This paper introduces elimination trees, a novel form of diffracting trees that offer pool and stack implementations with superior response (on average constant) under high loads, while guaranteeing logarithmic time "deterministic" termination under sparse request patterns. 1 A preliminary version of this paper appeared in the proceedings of the 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA). Contact Author: Email:shanir@theory.lcs.mit.edu 1 Introduction As multiprocessing breaks away from its traditional number crunching role, we are likely to see a growing need for highly distributed and parallel coordination structures. A realtime application such as a system of sensors and actuators will require fast response under both sparse and intense activity levels (typical examples could be a ra...
Cloud control with distributed rate limiting
 In SIGCOMM
, 2007
"... Today’s cloudbased services integrate globally distributed resources into seamless computing platforms. Provisioning and accounting for the resource usage of these Internetscale applications presents a challenging technical problem. This paper presents the design and implementation of distributed ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
Today’s cloudbased services integrate globally distributed resources into seamless computing platforms. Provisioning and accounting for the resource usage of these Internetscale applications presents a challenging technical problem. This paper presents the design and implementation of distributed rate limiters, which work together to enforce a global rate limit across traffic aggregates at multiple sites, enabling the coordinated policing of a cloudbased service’s network traffic. Our abstraction not only enforces a global limit, but also ensures that congestionresponsive transportlayer flows behave as if they traversed a single, shared limiter. We present two designs—one general purpose, and one optimized for TCP—that allow service operators to explicitly trade off between communication costs and system accuracy, efficiency, and scalability. Both designs are capable of rate limiting thousands of flows with negligible overhead (less than 3 % in the tested configuration). We demonstrate that our TCPcentric design is scalable to hundreds of nodes while robust to both loss and communication delay, making it practical for deployment in nationwide service providers.
An Inherent Bottleneck in Distributed Counting
 Journal of Parallel and Distributed Computing
, 1997
"... A distributed counter allows each processor in an asynchronous message passing network to access the counter value and increment it. We study the problem of implementing a distributed counter such that no processor is a communication bottleneck. We prove a lower bound of\Omega\Gamma/20 n= log log n) ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
A distributed counter allows each processor in an asynchronous message passing network to access the counter value and increment it. We study the problem of implementing a distributed counter such that no processor is a communication bottleneck. We prove a lower bound of\Omega\Gamma/20 n= log log n) on the number of messages that some processor must exchange in a sequence of n counting operations spread over n processors. We propose a counter that achieves this bound when each processor increments the counter exactly once. Hence, the lower bound is tight. Because most algorithms and data structures count in some way, the lower bound holds for many distributed computations. We feel that the proposed concept of a communication bottleneck is a relevant measure of efficiency for a distributed algorithm and data structure, because it indicates the achievable degree of distribution. 1 Introduction Counting is an essential ingredient in virtually any computation. It is therefore highly de...
A Steady State Analysis of Diffracting Trees
, 1997
"... Diffracting trees are an effective and highly scalable distributedparallel technique for shared counting and load balancing. This paper presents the first steadystate combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our mo ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Diffracting trees are an effective and highly scalable distributedparallel technique for shared counting and load balancing. This paper presents the first steadystate combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our model is simple and sufficiently high level to overcome many implementation specific details, and yet as we will show it is rich enough to accurately predict empirically observed behaviors. As a result of our analysis we were able to identify starvation problems in the original diffracting tree algorithm and modify it to a create a more stable version. We are also able to identify the range in which the diffracting tree performs most efficiently, and the ranges in which its performance degrades. We believe our model and modeling approach openthewayto steadystate analysis of other distributedparallel structures such as counting networks and elimination trees.
Linear lower bounds on realworld implementations of concurrent objects
 In Proceedings of the 46th Annual Symposium on Foundations of Computer Science (FOCS
, 2005
"... Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations consid ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Abstract This paper proves \Omega (n) lower bounds on the time to perform a single instance of an operationin any implementation of a large class of data structures shared by n processes. For standarddata structures such as counters, stacks, and queues, the bound is tight. The implementations considered may apply any deterministic primitives to a base object. No bounds are assumedon either the number of base objects or their size. Time is measured as the number of steps a process performs on base objects and the number of stalls it incurs as a result of contentionwith other processes. 1
Scalable Concurrent Priority Queue Algorithms
 In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating sy ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing smallscale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
Randomized Priority Queues for Fast Parallel Access
 Journal of Parallel and Distributed Computing
, 1997
"... Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority queue access by many processors is required to make this approach scalable. We present simple and portable randomized algorithms for parallel priority queues on distributed memory machines with fully distributed storage. Accessing O(n) out of m elements on an nprocessor network with diameter d requires amortized time O with high probability for many network types. On logarithmic diameter networks, the algorithms are as fast as the best previously known EREWPRAM methods. Implementations demonstrate that the approach is already useful for medium scale parallelism.
SmallDepth Counting Networks and Related Topics
, 1994
"... In [5], Aspnes, Herlihy, and Shavit generalized the notion of a sorting network by introducing a class of so called "counting" networks and establishing an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the do ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In [5], Aspnes, Herlihy, and Shavit generalized the notion of a sorting network by introducing a class of so called "counting" networks and establishing an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the domain of asynchronous shared llemory machines. In this thesis, we continue the analysis of counting networks and produce a number of new upper bounds on their depths. Our results are predicated on the rich combinatorial structure which counting networks possess. In particular, we present a simple explicit construction of an O(lg n lg lg n)depth counting network, a randomized construction of an O(lg n)depth network (which works with extremely high probability), and we present an existential proof of a deterministic O(lg n)depth network. The latter result matches the trivial ((lg n)depth lower bound to within a constant factor. Our main result is a uniform polynomialtime construction of an O(lg n)depth counting network which depends heavily on the existential result, but makes use of extractor functions introduced in [25]. Using the extractor, we construct regular high degree hipattire graphs with extremely strong expansion properties. We believe this result is of independent interest.