Results 1  10
of
12
How Asymmetry Helps Load Balancing
 Journal of the ACM
, 1999
"... This paper deals with balls and bins processes related to randomized load balancing, dynamic resource allocation, and hashing. Suppose ¡ balls have to be assigned to ¡ bins, where each ball has to be placed without knowledge about the distribution of previously placed balls. The goal is to achieve a ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
This paper deals with balls and bins processes related to randomized load balancing, dynamic resource allocation, and hashing. Suppose ¡ balls have to be assigned to ¡ bins, where each ball has to be placed without knowledge about the distribution of previously placed balls. The goal is to achieve an allocation that is as even as possible so that no bin gets much more balls than the average. A well known and good solution for this problem is to choose ¢ possible locations for each ball at random, to look into each of these bins, and to place the ball into the least full among these bins. This class of algorithms has been investigated intensively in the past, but almost all previous analyses assume that the ¢ locations for each ball are chosen uniformly and independently at random from the set of all bins. We investigate whether a nonuniform and possibly dependent choice of the ¢
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiplechoice processes in the heavily loaded case were obtained using majorization by the singlechoice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiplechoice processes are fundamentally different from the singlechoice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiplechoice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the singlechoice process. In particular, we investigate the allocation obtained by two different multiplechoice allocation schemes,
The natural workstealing algorithm is stable
 In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatoralloca ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatorallocation functions. During each timestep of our process, a generatorallocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unittime task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the workgeneration model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural workstealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Perfectly balanced allocation
 in Proceedings of the 7th International Workshop on Randomization and Approximation Techniques in Computer Science, Princeton, NJ, 2003, Lecture Notes in Comput. Sci. 2764
, 2003
"... Abstract. We investigate randomized processes underlying load balancing based on the multiplechoice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previousl ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Abstract. We investigate randomized processes underlying load balancing based on the multiplechoice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previously, it was known that a simple process that places the balls one by one in the least loaded bin can achieve a maximum load of m/n + Θ(log log n) with high probability. Furthermore, it was known that it is possible to achieve (with high probability) a maximum load of at most ⌈m/n ⌉ +1using maximum flow computations. In this paper, we extend these results in several aspects. First of all, we show that if m ≥ cn log n for some sufficiently large c, thenaperfect distribution of balls among the bins can be achieved (i.e., the maximum load is ⌈m/n⌉) with high probability. The bound for m is essentially optimal, because it is known that if m ≤ c ′ n log n for some sufficiently small constant c ′ , the best possible maximum load that can be achieved is ⌈m/n ⌉ +1with high probability. Next, we analyze a simple, randomized load balancing process based on a local search paradigm. Our first result here is that this process always converges to a best possible load distribution. Then, we study the convergence speed of the process. We show that if m is sufficiently large compared to n,thenno matter with which ball distribution the system starts, if the imbalance is ∆, then the process needs only ∆·n O(1) steps to reach a perfect distribution, with high probability. We also prove a similar result for m ≈ n, and show that if m = O(n log n / log log n), then an optimal load distribution (which has the maximum load of ⌈m/n ⌉ +1) is reached by the random process after a polynomial number of steps, with high probability.
Efficient hashing with lookups in two memory accesses, in: 16th
 SODA, ACMSIAM
"... The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the c ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the concept of twoway hashing where the largest bucket contains O(log log n) balls with high probability. The hash look up will now search in both the buckets an item hashes to. Since an item may be placed in one of two buckets, we could potentially move an item after it has been initially placed to reduce maximum load. Using this fact, we present a simple, practical hashing scheme that maintains a maximum load of 2, with high probability, while achieving high memory utilization. In fact, with n buckets, even if the space for two items are preallocated per bucket, as may be desirable in hardware implementations, more than n items can be stored giving a high memory utilization. Assuming truly random hash functions, we prove the following properties for our hashing scheme. • Each lookup takes two random memory accesses, and reads at most two items per access. • Each insert takes O(log n) time and up to log log n+ O(1) moves, with high probability, and constant time in expectation. • Maintains 83.75 % memory utilization, without requiring dynamic allocation during inserts. We also analyze the tradeoff between the number of moves performed during inserts and the maximum load on a bucket. By performing at most h moves, we can maintain a maximum load of O(hlogl((~og~og:n/h)). So, even by performing one move, we achieve a better bound than by performing no moves at all. 1
Asynchronous scheduling of redundant disk arrays
, 2000
"... Random redundant allocation of data to parallel disk arrays can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be deferred until a block access is actually s ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
Random redundant allocation of data to parallel disk arrays can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be deferred until a block access is actually started on a disk. These algorithms are also generalized for coding schemes with low redundancy. Using extensive experiments, practically important quantities are measured which have so far eluded an analytical treatment: The delay distribution when a stream of requests approaches the limit of the sytem capacity, the system efficiency for parallel disk applications with bounded prefetching buffers, and the combination of both for mixed traffic. A further step towards practice is taken by outlining the system design for α: automatically loadbalanced parallel harddisk array. 1
Balanced Allocation on Graphs
 In Proc. 7th Symposium on Discrete Algorithms (SODA
, 2006
"... It is well known that if n balls are inserted into n bins, with high probability, the bin with maximum load contains (1 + o(1))log n / loglog n balls. Azar, Broder, Karlin, and Upfal [1] showed that instead of choosing one bin, if d ≥ 2 bins are chosen at random and the ball inserted into the least ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
It is well known that if n balls are inserted into n bins, with high probability, the bin with maximum load contains (1 + o(1))log n / loglog n balls. Azar, Broder, Karlin, and Upfal [1] showed that instead of choosing one bin, if d ≥ 2 bins are chosen at random and the ball inserted into the least loaded of the d bins, the maximum load reduces drastically to log log n / log d+O(1). In this paper, we study the two choice balls and bins process when balls are not allowed to choose any two random bins, but only bins that are connected by an edge in an underlying graph. We show that for n balls and n bins, if the graph is almost regular with degree n ǫ, where ǫ is not too small, the previous bounds on the maximum load continue to hold. Precisely, the maximum load is
Improved Bounds for Finger Search on a RAM
 In Algorithms – ESA 2003, LNCS Vol. 2832 (Springer 2003
, 2003
"... We present a new finger search tree with O(1) worstcase update time and O(log log d) expected search time with high probability in the Random Access Machine (RAM) model of computation for a large class of input distributions. The parameter d represents the number of elements (distance) between ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
We present a new finger search tree with O(1) worstcase update time and O(log log d) expected search time with high probability in the Random Access Machine (RAM) model of computation for a large class of input distributions. The parameter d represents the number of elements (distance) between the search element and an element pointed to by a finger, in a finger search tree that stores n elements. For the need of the analysis we model the updates by a "balls and bins" combinatorial game that is interesting in its own right as it involves insertions and deletions of balls according to an unknown distribution.
Balanced Relay Allocation on Heterogeneous Unstructured Overlays
"... Abstract — Due to the increased usage of NAT boxes and firewalls, it has become harder for applications to establish direct connections seamlessly among two endhosts. A recently adopted proposal to mitigate this problem is to use relay nodes, endhosts that act as intermediary points to bridge conn ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract — Due to the increased usage of NAT boxes and firewalls, it has become harder for applications to establish direct connections seamlessly among two endhosts. A recently adopted proposal to mitigate this problem is to use relay nodes, endhosts that act as intermediary points to bridge connections. Efficiently selecting a relay node is not a trivial problem, specially in a largescale unstructured overlay system where endhosts are heterogeneous. In such environment, heterogeneity among the relay nodes comes from the inherent differences in their capacities and from the way overlay networks are constructed. Despite this fact, good relay selection algorithms should effectively balance the aggregate load across the set of relay nodes. In this paper, we address this problem using algorithms based on the two random choices method. We first prove that the classic loadbased algorithm can effectively balance the load even when relays are heterogeneous, and that its performance depends directly on relay heterogeneity. Second, we propose an utilizationbased random choice algorithm to distribute load in order to balance relay utilization. Numerical evaluations through simulations illustrate the effectiveness of this algorithm, indicating that it might also yield provable performance (which we conjecture). Finally, we support our theoretical findings through simulations of various largescale scenarios, with realistic relay heterogeneity. I.
Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn
, 2004
"... Numerous proposals exist for load balancing in peertopeer (p2p) networks. Some focus on namespace balancing, making the distance between nodes as uniform as possible. This technique works well under ideal conditions, but not under those found empirically. Instead, researchers have found heavytail ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Numerous proposals exist for load balancing in peertopeer (p2p) networks. Some focus on namespace balancing, making the distance between nodes as uniform as possible. This technique works well under ideal conditions, but not under those found empirically. Instead, researchers have found heavytailed query distributions (skew), high rates of node join and leave (churn), and wide variation in node network and storage capacity (heterogeneity) . Other approaches tackle these lessthanideal conditions, but give up on important security properties. We propose an algorithm that both facilitates good performance and does not dilute security. Our algorithm, kChoices, achieves load balance by greedily matching nodes' target workloads with actual applied workloads through limited sampling, and limits any fundamental decrease in security by basing each nodes' set of potential identifiers on a single certificate. Our algorithm compares favorably to four others in tracedriven simulations. We have implemented our algorithm and found that it improved aggregate throughput by 20% in a widely heterogeneous system in our experiments.