Results 1 - 10
of
51
The Power of Two Random Choices: A Survey of Techniques and Results
- in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Using Multiple Hash Functions to Improve IP Lookups
- IN PROCEEDINGS OF IEEE INFOCOM
, 2000
"... High performance Internet routers require a mechanism for very efficient IP address look-ups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash ..."
Abstract
-
Cited by 61 (11 self)
- Add to MetaCart
High performance Internet routers require a mechanism for very efficient IP address look-ups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash tables based on using multiple hashes of each input key (which is an IP address). The methods we describe are fast, simple, scalable, parallelizable, and flexible. In particular, in instances where the goal is to have one hash bucket fit into a cache line, using multiple hashes proves extremely suitable. We provide a general analysis of this hashing technique and specifically discuss its application to binary search on levels.
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate balls-into-bins processes allocating m balls into n bins based on the multiple-choice paradigm. In the classical single-choice variant each ball is placed into a bin selected uniformly at random. In a multiple-choice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
We investigate balls-into-bins processes allocating m balls into n bins based on the multiple-choice paradigm. In the classical single-choice variant each ball is placed into a bin selected uniformly at random. In a multiple-choice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiple-choice processes in the heavily loaded case were obtained using majorization by the single-choice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiple-choice processes are fundamentally different from the single-choice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiple-choice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the single-choice process. In particular, we investigate the allocation obtained by two different multiple-choice allocation schemes,
Fast hash table lookup using extended Bloom filter: an aid to network processing
- In ACM SIGCOMM
, 2005
"... ..."
Why simple hash functions work: Exploiting the entropy in a data stream
- In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines
- SIGCOMM '06
, 2006
"... Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Appr ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Approximate Concurrent State Machines (ACSMs) that can return false positives, false negatives, or a “don’t know” response. We describe three techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation. Our analysis leads us to an extremely efficient hashing-based scheme with several parameters that can be chosen to trade off space, computation, and the impact of errors. Our hashing approach also yields a simple alternative structure with the same functionality as a counting Bloom filter that uses much less space. We show how ACSMs can be used for video congestion control. Using an ACSM, a router can implement sophisticated Active Queue Management (AQM) techniques for video traffic (without the need for standards changes to mark packets or change video formats), with a factor of four reduction in memory compared to full-state schemes and with very little error. We also show that ACSMs show promise for real-time detection of P2P traffic.
The Asymptotics of Selecting the Shortest of Two, Improved
- UNIVERSITY OF ILLINOIS
, 1999
"... We investigate variations of a novel, recently proposed load balancing scheme based on small amounts of choice. The static setting is modeled as a balls-and-bins process. The balls are sequentially placed into bins, with each ball selecting d bins randomly and going to the bin with the fewest balls. ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We investigate variations of a novel, recently proposed load balancing scheme based on small amounts of choice. The static setting is modeled as a balls-and-bins process. The balls are sequentially placed into bins, with each ball selecting d bins randomly and going to the bin with the fewest balls. A similar dynamic setting is modeled as a scenario where tasks arrive as a Poisson process at a bank of FIFO servers and queue at one for service. Tasks probe a small random sample of servers in the bank and queue at the server with the fewest tasks. Recently
The natural work-stealing algorithm is stable
- In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-alloca ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-allocation functions. During each time-step of our process, a generator-allocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unit-time task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the work-generation model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural work-stealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
An Improved Construction for Counting Bloom Filters
- 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-bas ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Perfectly balanced allocation
- in Proceedings of the 7th International Workshop on Randomization and Approximation Techniques in Computer Science, Princeton, NJ, 2003, Lecture Notes in Comput. Sci. 2764
, 2003
"... Abstract. We investigate randomized processes underlying load balancing based on the multiple-choice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previousl ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract. We investigate randomized processes underlying load balancing based on the multiple-choice paradigm: m balls have to be placed in n bins, and each ball can be placed into one out of 2 randomly selected bins. The aim is to distribute the balls as evenly as possible among the bins. Previously, it was known that a simple process that places the balls one by one in the least loaded bin can achieve a maximum load of m/n + Θ(log log n) with high probability. Furthermore, it was known that it is possible to achieve (with high probability) a maximum load of at most ⌈m/n ⌉ +1using maximum flow computations. In this paper, we extend these results in several aspects. First of all, we show that if m ≥ cn log n for some sufficiently large c, thenaperfect distribution of balls among the bins can be achieved (i.e., the maximum load is ⌈m/n⌉) with high probability. The bound for m is essentially optimal, because it is known that if m ≤ c ′ n log n for some sufficiently small constant c ′ , the best possible maximum load that can be achieved is ⌈m/n ⌉ +1with high probability. Next, we analyze a simple, randomized load balancing process based on a local search paradigm. Our first result here is that this process always converges to a best possible load distribution. Then, we study the convergence speed of the process. We show that if m is sufficiently large compared to n,thenno matter with which ball distribution the system starts, if the imbalance is ∆, then the process needs only ∆·n O(1) steps to reach a perfect distribution, with high probability. We also prove a similar result for m ≈ n, and show that if m = O(n log n / log log n), then an optimal load distribution (which has the maximum load of ⌈m/n ⌉ +1) is reached by the random process after a polynomial number of steps, with high probability.

