The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Using Multiple Hash Functions to Improve IP Lookups
 IN PROCEEDINGS OF IEEE INFOCOM
, 2000
"... High performance Internet routers require a mechanism for very efficient IP address lookups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash ..."
High performance Internet routers require a mechanism for very efficient IP address lookups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash tables based on using multiple hashes of each input key (which is an IP address). The methods we describe are fast, simple, scalable, parallelizable, and flexible. In particular, in instances where the goal is to have one hash bucket fit into a cache line, using multiple hashes proves extremely suitable. We provide a general analysis of this hashing technique and specifically discuss its application to binary search on levels.
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiplechoice processes in the heavily loaded case were obtained using majorization by the singlechoice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiplechoice processes are fundamentally different from the singlechoice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiplechoice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the singlechoice process. In particular, we investigate the allocation obtained by two different multiplechoice allocation schemes,
A staffing algorithm for call centers with skillbased routing
 Manufacturing and Service Operations Management
, 2005
"... informs ® doi 10.1287/msom.1050.0086 © 2005 INFORMS Call centers usually handle several types of calls, butitis usually notpossible or costeffective to have every agent be able to handle every type of call. Thus, the agents tend to have different skills, in different combinations. In such an environ ..."
informs ® doi 10.1287/msom.1050.0086 © 2005 INFORMS Call centers usually handle several types of calls, butitis usually notpossible or costeffective to have every agent be able to handle every type of call. Thus, the agents tend to have different skills, in different combinations. In such an environment, it is challenging to route calls effectively and determine the staff requirements. This paper addresses both of these routing and staffing problems by exploiting limited crosstraining. Consistent with the literature on flexible manufacturing, we find that minimal flexibility can provide great benefits: Simulation experiments show that when (1) the servicetime distribution does not depend on the call type or the agent and (2) each agent has only two skills, in appropriate combinations, the performance is almost as good as when each agent has all skills. We apply this flexibility property to develop an algorithm for both routing and staffing, aiming to minimize the total staff subject to perclass performance constraints. With appropriate flexibility, it suffices to use a suboptimal routing algorithm. Simulation experiments show that the overall procedure can be remarkably effective: The required staff with limited crosstraining can be nearly the same as if all agents had all skills. Hence, the overall algorithm is nearly optimal for that scenario.
An Improved Construction for Counting Bloom Filters
 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbas ..."
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbased alternative based on dleft hashing called a dleft CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Load Balancing with Memory
 In Proc. of the 43rd IEEE Symp. on Foundations of Computer Science (FOCS
, 2002
"... A standard load b lancing model considers placing n b lls into nb[K y choosing d possib: locations for eachb all independently and uniformly at random and sequentially placing each in the least loaded of its chosenb ins. It is well known that allowing just a small amount of choice (d = 2) greatly ..."
A standard load b lancing model considers placing n b lls into nb[K y choosing d possib: locations for eachb all independently and uniformly at random and sequentially placing each in the least loaded of its chosenb ins. It is well known that allowing just a small amount of choice (d = 2) greatly improves performance over random placement (d = 1). In this paper, we show that similar performance gains occurb y introducing memory. We focus on the situation where each time ab all is placed, the least loaded of that b ll's choices after placement is rememb ered and used as one of the possibH choices for the nextb all. For example, we show that when eachb all gets just one random choice,be can also choose theb est of the lastb all's choices, the maximum numb er ofb alls in ab in is log log n/2log# +O(1) with high pro bbU:: y, where # =(1+ # 5)/2 is the golden ratio. The asymptotic performance is thereforeb etter with one random choice and one choice from memory than with two fresh random choices for eachb all; the performance with memory asymptotically matches the asymmetric policy using two choices introducedb y Vocking. More generally, we find that a small amount of memory, like a small amount of choice, can dramatically improve the loadb alancing performance. We also investigate continuous time variations corresponding to queueing systems, where we find similar results. 1
Stochastic Models for the design and management of customer contact centers: some research directions
 February 2005a. URL: http://www.columbia.edu/˜ww2040/submissionREV.pdf
, 2002
"... A (customer) contact center is a collection of resources providing an interface between a service provider and its customers. The classical contact center is a call center, containing a collection of service representatives (reps) who talk to customers over the telephone. In a call center, the servi ..."
A (customer) contact center is a collection of resources providing an interface between a service provider and its customers. The classical contact center is a call center, containing a collection of service representatives (reps) who talk to customers over the telephone. In a call center, the service reps are supported by quite elaborate informationandcommunicationtechnology (ICT) equipment, such as a private branch exchange (PBX), an interactive voice response (IVR) unit, an automatic call distributor (ACD), a personal computer (PC) and assorted databases. With the rapid growth of ecommerce, contact between the service provider and its customers if often made via email or the Internet instead of by telephone. Thus the general interface between a service provider and its customers is now often called a contact center. The design and management of contact centers is important, and worthy of research, because contact centers comprise a large, growing part of the economy and because they are quite complicated. Classic call centers are complicated because
Optimal fast hashing
 In 28th IEEE International Conference on Computer Communications (INFOCOM
, 2009
"... Abstract—This paper is about designing optimal highthroughput hashing schemes that minimize the total number of memory accesses needed to build and access an hash table. Recent schemes often promote the use of multiplechoice hashing. However, such a choice also implies a significant increase in the ..."
Abstract—This paper is about designing optimal highthroughput hashing schemes that minimize the total number of memory accesses needed to build and access an hash table. Recent schemes often promote the use of multiplechoice hashing. However, such a choice also implies a significant increase in the number of memory accesses to the hash table, which translates into higher power consumption and lower throughput. In this paper, we propose to only use choice when needed. Given some target hash table overflow rate, we provide a lower bound on the total number of needed memory accesses. Then, we design and analyze schemes that provably achieve this lower bound over a large range of target overflow values. Further, for the multilevel hash table scheme, we prove that the optimum occurs when its subtable sizes decrease in a geometric way, thus formally confirming a heuristic ruleofthumb. A. Background I.
Robust Counting Via Counter Braids: An ErrorResilient Network Measurement Architecture
"... Abstract—A novel counter architecture, called Counter Braids, has recently been proposed for accurate perflow measurement on highspeed links. Inspired by sparse random graph codes, Counter Braids solves two central problems of perflow measurement: onetoone flowtocounter association and large ..."
Abstract—A novel counter architecture, called Counter Braids, has recently been proposed for accurate perflow measurement on highspeed links. Inspired by sparse random graph codes, Counter Braids solves two central problems of perflow measurement: onetoone flowtocounter association and large amount of unused counter space. It eliminates the onetoone association by randomly hashing a flow label to multiple counters and minimizes counter space by incrementally compressing counts as they accumulate. The random hash values are reproduced offline from a list of flow labels, with which flow sizes are decoded using a fast message passing algorithm. The decoding of Counter Braids introduces the problem of collecting flow labels active in a measurement epoch. An exact solution to this problem is expensive. This paper complements the previous proposal with an approximate flow label collection scheme and a novel errorresilient decoder that decodes despite missing flow labels. The approximate flow label collection detects new flows with variablelength signature counting Bloom filters in SRAM, and stores flow labels in highdensity DRAM. It provides a good tradeoff between space and accuracy: more than 99 percent of the flows are captured with very little SRAM space. The decoding challenge posed by missing flow labels calls for a new algorithm as the original message passing decoder becomes errorprone. In terms of sparse random graph codes, the problem is equivalent to decoding with graph deficiency, a scenario beyond coding theory. The errorresilient decoder employs a new message passing algorithm that recovers most flow sizes exactly despite graph deficiency. Together, our solution achieves a 10fold reduction in SRAM space compared to hashtable based implementations, as demonstrated with Internet trace evaluations. I.
Symmetric vs. Asymmetric MultipleChoice Algorithms
"... Multiplechoice allocation algorithms have been studied intensively over the last decade. These algorithms have several applications in the areas of load balancing, routing, resource allocation and hashing. The underlying idea is simple and can be explained best in the ballsandbins model: Instead ..."
Multiplechoice allocation algorithms have been studied intensively over the last decade. These algorithms have several applications in the areas of load balancing, routing, resource allocation and hashing. The underlying idea is simple and can be explained best in the ballsandbins model: Instead of assigning balls (jobs, requests, or keys) simply at random to bins (machines, servers, or positions in a hash table), choose first a small set of bins at random, inspect these bins, and place the ball into one of the bins containing the smallest number of balls among them. The simple idea of first selecting a small set of alternatives at random and then making the final choice after careful inspection of these alternatives leads to great improvements against algorithms that place their decisions simply at random. We illustrate the power of this principle in terms of simple ballsandbins processes. In particular, we study recently presented algorithms that treat bins asymmetrically in order to obtain a better load balancing. We compare the behavior of these asymmetric schemes with symmetric schemes and prove that the asymmetric schemes achieve a better load balancing than their symmetric counterparts. 1