Results 1  10
of
13
Tight Bounds for Parallel Randomized Load Balancing
 Computing Research Repository
, 1992
"... We explore the fundamental limits of distributed ballsintobins algorithms, i.e., algorithms where balls act in parallel, as separate agents. This problem was introduced by Adler et al., who showed that nonadaptive and symmetric algorithms cannot reliably perform better than a maximum bin load of Θ ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
We explore the fundamental limits of distributed ballsintobins algorithms, i.e., algorithms where balls act in parallel, as separate agents. This problem was introduced by Adler et al., who showed that nonadaptive and symmetric algorithms cannot reliably perform better than a maximum bin load of Θ(loglogn/logloglogn) within the same number of rounds. We present an adaptive symmetric algorithm that achieves a bin load of two in log ∗ n + O(1) communication rounds using O(n) messages in total. Moreover, larger bin loads can be traded in for smaller time complexities. We prove a matching lower bound of (1−o(1))log ∗ n on the time complexity of symmetric algorithms that guarantee small bin loads at an asymptotically optimal message complexity of O(n). The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls are not globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time. As an application, we consider the following problem. Given a fully connected graph of n nodes, where each node needs to send and receive up to n messages, and in each round each node may send one message over each link, deliver all messages as quickly as possible to their destinations. We give a simple and robust algorithm of time complexity O(log ∗ n) for this task and provide a generalization to the case where all nodes initially hold arbitrary sets of messages. Completing the picture, we give a less practical, but asymptotically optimal algorithm terminating within O(1) rounds. All these bounds hold with high probability.
The (1 + β)Choice Process and Weighted BallsintoBins
"... Suppose m balls are sequentially thrown into n bins where each ball goes into a random bin. It is wellknown that the gap between the load of the most loaded bin m log n and the average is Θ( n), for large m. If each ball goes to the lesser loaded of two random bins, this gap dramatically reduces to ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Suppose m balls are sequentially thrown into n bins where each ball goes into a random bin. It is wellknown that the gap between the load of the most loaded bin m log n and the average is Θ( n), for large m. If each ball goes to the lesser loaded of two random bins, this gap dramatically reduces to Θ(log log n) independent of m. Consider now the following “(1 + β)choice ” process for some parameter β ∈ (0, 1): each ball goes to a random bin with probability (1−β) and the lesser loaded of two random bins with probability β. How does the gap for such a process behave? Suppose that the weight of each ball was drawn from a geometric distribution. How is the gap (now defined in terms of weight) affected? In this work, we develop general techniques for analyzing such ballsintobins processes. Specifically, we show that for the (1 + β)choice process above, the gap is Θ(log n/β), irrespective of m. Moreover the gap stays at Θ(log n/β) in the weighted case for a large class of weight distributions. No nontrivial explicit bounds were previously known in the weighted case, even for the 2choice paradigm. 1
Kinesis: A new approach to replica placement in distributed storage systems
 ACM Transactions on Storage (TOS
"... Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failureisolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availabi ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failureisolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availability), and scattered distribution (independent, pseudorandom spread of replicas in the system). These design principles enable storage systems to achieve balanced utilization of storage and network resources in the presence of incremental system expansions, failures of single and shared components, and skewed distributions of data size and popularity. In turn, this ability leads to significantly reduced resource provisioning costs, good userperceived response times, and fast, parallelized recovery from independent and correlated failures. This paper validates Kinesis through theoretical analysis, simulations, and experiments on a prototype implementation. Evaluations driven by realworld traces show that Kinesis can significantly outperform the widelyused Chain replicaplacement strategy in terms of resource requirements, endtoend delay, and failure recovery.
ScalA: NonLinearizable Computing Breaks The Scalability Barrier
, 2010
"... We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for e ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We propose a relaxed version of linearizability and a set of load balancing algorithms for trading off adherence to concurrent data structure semantics and scalability. We consider data structures that store elements in a given order such as stacks and queues. Intuitively, a concurrent stack, for example, is linearizable if the effect of push and pop operations on the stack always occurs instantaneously. A linearizable stack guarantees that pop operations return the youngest stack elements first, i.e., the elements in the reverse order in which the operations that pushed them onto the stack took effect. Linearizability allows to reorder concurrent (but not sequential) operations arbitrarily. We relax linearizability to klinearizability with k> 0 to also allow sequences of up to k−1 sequential operations to be reordered arbitrarily and thus execute concurrently. With a klinearizable stack, for example, a pop operation may not return the youngest but the kth youngest element on the stack. It turns out that klinearizability may be tolerated by concurrent applications such as process schedulers and web servers that already use it implicitly. Moreover, klinearizability does provide positive scalability in some cases because more operations may be executed concurrently but may still be too restrictive under high contention. We therefore propose a set of load balancing algorithms, which significantly improve scalability by approximating klinearizability probabilistically. We introduce Scal, an opensource framework for implementing klinearizable approximations of concurrent data structures, and show in multiple benchmarks that Scal provides positive scalability for concurrent data structures that typically do not scale under high contention.
Converting Online Algorithms to Local Computation Algorithms?
"... Abstract. We propose a general method for converting online algorithms to local computation algorithms,3 by selecting a random permutation of the input, and simulating running the online algorithm. We bound the number of steps of the algorithm using a query tree, which models the dependencies betwe ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a general method for converting online algorithms to local computation algorithms,3 by selecting a random permutation of the input, and simulating running the online algorithm. We bound the number of steps of the algorithm using a query tree, which models the dependencies between queries. We improve previous analyses of query trees on graphs of bounded degree, and extend this improved analysis to the cases where the degrees are distributed binomially, and to a special case of bipartite graphs. Using this method, we give a local computation algorithm for maximal matching in graphs of bounded degree, which runs in time and space O(log3 n). We also show how to convert a large family of load balancing algorithms (related to balls and bins problems) to local computation algorithms. This gives several local load balancing algorithms which achieve the same approximation ratios as the online algorithms, but run in O(logn) time and space. Finally, we modify existing local computation algorithms for hypergraph 2coloring and kCNF and use our improved analysis to obtain better time and space bounds, ofO(log4 n), removing the dependency on the maximal degree of the graph from the exponent.
Balanced allocation: memory performance tradeoffs. preprint
"... Suppose we sequentially put n balls into n bins. If we put each ball into a random bin then the heaviest bin will contain ∼ log n / log log n balls with high probability. However, Azar, Broder, Karlin and Upfal [2] showed that if each time we choose two bins at random and put the ball in the least l ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Suppose we sequentially put n balls into n bins. If we put each ball into a random bin then the heaviest bin will contain ∼ log n / log log n balls with high probability. However, Azar, Broder, Karlin and Upfal [2] showed that if each time we choose two bins at random and put the ball in the least loaded bin among the two then the heaviest bin will contain only ∼ log log n balls with high probability. How much memory do we need to implement this scheme? We need roughly log log log n bits per bin, and n log log log n bits in total. Let us assume now that we have limited amount of memory. For each ball, we are given two random bins and we have to put the ball into one of them. Our goal is to minimize the load of the heaviest bin. We prove that if we have n1−δ bits then the heaviest bin will contain at least Ω(δ log n / log log n) balls with high probability. The bound is tight in the communication complexity model.
Multiplechoice Balanced Allocation in (almost) Parallel
"... We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, an ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, and Upfal (SIAM J. Comput. 1999), in which, for sequentially arriving balls, each ball chooses two bins at random, and gets placed into one of those two bins with minimum load. The maximum load of any bin after the last ball is allocated by GREEDY[2] is well understood, as is, indeed, the entire load distribution, for a wide range of settings. The main goal of our paper is to study balls and bins allocation processes in a parallel environment with the balls arriving in batches. In our model, m balls arrive in batches of size n each (with n being also equal to the number of bins), and the balls in each batch are to be distributed among the bins simultaneously. In this setting, we consider an algorithm that uses GREEDY[2] for all balls within a given batch, the answers to those balls ’ load queries are with respect to the bin loads at the end of the previous batch, and do not in any way depend on decisions made by other balls from the same batch. Our main contribution is a tight analysis of the new process allocating balls in batches: we show that after the allocation of any number of batches, the gap between maximum and minimum load is O(log n) with high probability, and is therefore independent of the number of batches used.
Distributed Computing manuscript No. (will be inserted by the editor) Tight Bounds for Parallel Randomized Load Balancing
"... Abstract Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest nonadaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(log log n) rounds, and any such algorithm running for r ∈ O(1) r ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest nonadaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(log log n) rounds, and any such algorithm running for r ∈ O(1) rounds incurs a bin load of Ω((log n / log log n)1/r). In this work, we explore the fundamental limits of the general problem. We present a simple adaptive symmetric algorithm that achieves a bin load of 2 in log ∗ n + O(1) communication rounds using O(n) messages in total. Our main result, however, is a matching lower bound of (1 − o(1)) log ∗ n on the time complexity of symmetric algorithms that guarantee small bin loads. The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls need not be globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time. An extended abstract of preliminary work appeared at STOC 2011 [24] and the corresponding article has been published on arxiv [23].
Multidimensional Balanced Allocation for Multiple Choice & (1 + β) Processes
, 2011
"... copyrighted is accepted for publication. It has been issued as a Research Report for the early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requ ..."
Abstract
 Add to MetaCart
(Show Context)
copyrighted is accepted for publication. It has been issued as a Research Report for the early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T.J. Watson Research Center, Publications,
Balanced Allocations: A Simple Proof for the Heavily Loaded Case
, 2013
"... We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by (1 + o(1)) log logn, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. in [2]. Their proof uses heavy machinery ..."
Abstract
 Add to MetaCart
We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by (1 + o(1)) log logn, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. in [2]. Their proof uses heavy machinery from MarkovChain theory and some of the calculations are done using computers. In this manuscript we provide a significantly simpler proof that is not aided by computers and is self contained. The simplification comes at a cost of weaker bounds on the low order terms and a weaker tail bound for the probability of deviating from the expectation. 1 A Bit of History In the Greedy[d] process (sometimes called the dchoice process), balls are placed sequentially into [n] bins with the following rule: Each ball is placed by uniformly and independently sampling d bins and assigning the ball to the least loaded of the d bins. In other words, the probability a ball is placed in one of the i heaviest bins (at the time when it is placed) is exactly1 (i/n)d. We remark that using this characterization there is no need to assume that d is a natural number (though the process is algorithmically much simpler when d is an integer). The main point is that whenever d> 1 the process is biased: the lighter bins have a higher chance of getting a ball. In this paper we are interested in the gap of the allocation, which is the difference between the number of balls in the heaviest bin, and the average. The