Results 1 
7 of
7
Tight Bounds for Parallel Randomized Load Balancing
 Computing Research Repository
, 1992
"... We explore the fundamental limits of distributed ballsintobins algorithms, i.e., algorithms where balls act in parallel, as separate agents. This problem was introduced by Adler et al., who showed that nonadaptive and symmetric algorithms cannot reliably perform better than a maximum bin load of Θ ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We explore the fundamental limits of distributed ballsintobins algorithms, i.e., algorithms where balls act in parallel, as separate agents. This problem was introduced by Adler et al., who showed that nonadaptive and symmetric algorithms cannot reliably perform better than a maximum bin load of Θ(loglogn/logloglogn) within the same number of rounds. We present an adaptive symmetric algorithm that achieves a bin load of two in log ∗ n + O(1) communication rounds using O(n) messages in total. Moreover, larger bin loads can be traded in for smaller time complexities. We prove a matching lower bound of (1−o(1))log ∗ n on the time complexity of symmetric algorithms that guarantee small bin loads at an asymptotically optimal message complexity of O(n). The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls are not globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time. As an application, we consider the following problem. Given a fully connected graph of n nodes, where each node needs to send and receive up to n messages, and in each round each node may send one message over each link, deliver all messages as quickly as possible to their destinations. We give a simple and robust algorithm of time complexity O(log ∗ n) for this task and provide a generalization to the case where all nodes initially hold arbitrary sets of messages. Completing the picture, we give a less practical, but asymptotically optimal algorithm terminating within O(1) rounds. All these bounds hold with high probability.
Kinesis: A new approach to replica placement in distributed storage systems
 ACM Transactions on Storage (TOS
"... Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failureisolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availabi ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failureisolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availability), and scattered distribution (independent, pseudorandom spread of replicas in the system). These design principles enable storage systems to achieve balanced utilization of storage and network resources in the presence of incremental system expansions, failures of single and shared components, and skewed distributions of data size and popularity. In turn, this ability leads to significantly reduced resource provisioning costs, good userperceived response times, and fast, parallelized recovery from independent and correlated failures. This paper validates Kinesis through theoretical analysis, simulations, and experiments on a prototype implementation. Evaluations driven by realworld traces show that Kinesis can significantly outperform the widelyused Chain replicaplacement strategy in terms of resource requirements, endtoend delay, and failure recovery.
The (1 + β)Choice Process and Weighted BallsintoBins
"... Suppose m balls are sequentially thrown into n bins where each ball goes into a random bin. It is wellknown that the gap between the load of the most loaded bin m log n and the average is Θ( n), for large m. If each ball goes to the lesser loaded of two random bins, this gap dramatically reduces to ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Suppose m balls are sequentially thrown into n bins where each ball goes into a random bin. It is wellknown that the gap between the load of the most loaded bin m log n and the average is Θ( n), for large m. If each ball goes to the lesser loaded of two random bins, this gap dramatically reduces to Θ(log log n) independent of m. Consider now the following “(1 + β)choice ” process for some parameter β ∈ (0, 1): each ball goes to a random bin with probability (1−β) and the lesser loaded of two random bins with probability β. How does the gap for such a process behave? Suppose that the weight of each ball was drawn from a geometric distribution. How is the gap (now defined in terms of weight) affected? In this work, we develop general techniques for analyzing such ballsintobins processes. Specifically, we show that for the (1 + β)choice process above, the gap is Θ(log n/β), irrespective of m. Moreover the gap stays at Θ(log n/β) in the weighted case for a large class of weight distributions. No nontrivial explicit bounds were previously known in the weighted case, even for the 2choice paradigm. 1
ScalA: NonLinearizable Computing Breaks The Scalability Barrier
, 2010
"... We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for e ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for example, is linearizable if the effect of push and pop operations on the stack always occurs instantaneously. A linearizable stack guarantees that pop operations return the youngest stack elements first, i.e., the elements in the reverse order in which the operations that pushed them onto the stack took effect. Linearizability allows to reorder concurrent (but not sequential) operations arbitrarily. We relax linearizability to klinearizability with k> 0 to also allow sequences of up to k−1 sequential operations to be reordered arbitrarily and thus execute concurrently. With a klinearizable stack, for example, a pop operation may not return the youngest but the kth youngest element on the stack. It turns out that klinearizability may be tolerated by concurrent applications such as process schedulers and web servers that already use it implicitly. Moreover, klinearizability does provide positive scalability in some cases because more operations may be executed concurrently but may still be too restrictive under high contention. We therefore propose a set of load balancing algorithms, which significantly improve scalability by approximating klinearizability probabilistically. We introduce Scal, an opensource framework for implementing klinearizable approximations of concurrent data structures, and show in multiple benchmarks that Scal provides positive scalability for concurrent data structures that typically do not scale under high contention.
Multiplechoice Balanced Allocation in (almost) Parallel
"... We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, an ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, and Upfal (SIAM J. Comput. 1999), in which, for sequentially arriving balls, each ball chooses two bins at random, and gets placed into one of those two bins with minimum load. The maximum load of any bin after the last ball is allocated by GREEDY[2] is well understood, as is, indeed, the entire load distribution, for a wide range of settings. The main goal of our paper is to study balls and bins allocation processes in a parallel environment with the balls arriving in batches. In our model, m balls arrive in batches of size n each (with n being also equal to the number of bins), and the balls in each batch are to be distributed among the bins simultaneously. In this setting, we consider an algorithm that uses GREEDY[2] for all balls within a given batch, the answers to those balls ’ load queries are with respect to the bin loads at the end of the previous batch, and do not in any way depend on decisions made by other balls from the same batch. Our main contribution is a tight analysis of the new process allocating balls in batches: we show that after the allocation of any number of batches, the gap between maximum and minimum load is O(log n) with high probability, and is therefore independent of the number of batches used.
Balanced allocation: memory performance tradeoffs. preprint
"... Suppose we sequentially put n balls into n bins. If we put each ball into a random bin then the heaviest bin will contain ∼ log n / log log n balls with high probability. However, Azar, Broder, Karlin and Upfal [2] showed that if each time we choose two bins at random and put the ball in the least l ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Suppose we sequentially put n balls into n bins. If we put each ball into a random bin then the heaviest bin will contain ∼ log n / log log n balls with high probability. However, Azar, Broder, Karlin and Upfal [2] showed that if each time we choose two bins at random and put the ball in the least loaded bin among the two then the heaviest bin will contain only ∼ log log n balls with high probability. How much memory do we need to implement this scheme? We need roughly log log log n bits per bin, and n log log log n bits in total. Let us assume now that we have limited amount of memory. For each ball, we are given two random bins and we have to put the ball into one of them. Our goal is to minimize the load of the heaviest bin. We prove that if we have n1−δ bits then the heaviest bin will contain at least Ω(δ log n / log log n) balls with high probability. The bound is tight in the communication complexity model.
Distributed Computing manuscript No. (will be inserted by the editor) Tight Bounds for Parallel Randomized Load Balancing
"... Abstract Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest nonadaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(log log n) rounds, and any such algorithm running for r ∈ O(1) r ..."
Abstract
 Add to MetaCart
Abstract Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest nonadaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(log log n) rounds, and any such algorithm running for r ∈ O(1) rounds incurs a bin load of Ω((log n / log log n)1/r). In this work, we explore the fundamental limits of the general problem. We present a simple adaptive symmetric algorithm that achieves a bin load of 2 in log ∗ n + O(1) communication rounds using O(n) messages in total. Our main result, however, is a matching lower bound of (1 − o(1)) log ∗ n on the time complexity of symmetric algorithms that guarantee small bin loads. The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls need not be globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time. An extended abstract of preliminary work appeared at STOC 2011 [24] and the corresponding article has been published on arxiv [23].