Results 1  10
of
209
Balanced Allocations
 SIAM Journal on Computing
, 1994
"... Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place ..."
Abstract

Cited by 335 (8 self)
 Add to MetaCart
(Show Context)
Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. We show that with high probability, the fullest box contains only ln ln n/ ln 2 +O(1) balls  exponentially less than before. Furthermore, we show that a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above. We discuss consequences of this and related theorems for dynamic resource allocation, hashing, and online load balancing.
The Power of Two Choices in Randomized Load Balancing
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract

Cited by 290 (24 self)
 Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
Multiple Object Identification with Passive RFID Tags
, 2002
"... We investigate the applicability of passive RFID systems to the task of identifying multiple tagged objects simultaneously, assuming that the number of tags is not known in advance. We present a combinatorial model of the communication mechanism between the reader device and the tags, and use this m ..."
Abstract

Cited by 174 (0 self)
 Add to MetaCart
We investigate the applicability of passive RFID systems to the task of identifying multiple tagged objects simultaneously, assuming that the number of tags is not known in advance. We present a combinatorial model of the communication mechanism between the reader device and the tags, and use this model to derive the optimal parameter setting for the reading process, based on estimates for the number of tags. Some results on the performance of an implementation are presented.
Semiparametric DifferenceinDifferences Estimators
 Review of Economic Studies
, 2005
"... The differenceindifferences (DID) estimator is one of the most popular tools for applied research in economics to evaluate the effects of public interventions and other treatments of interest on some relevant outcome variables. However, it is wellknown that the DID estimator is based on strong id ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
The differenceindifferences (DID) estimator is one of the most popular tools for applied research in economics to evaluate the effects of public interventions and other treatments of interest on some relevant outcome variables. However, it is wellknown that the DID estimator is based on strong identifying assumptions. In particular, the conventional DID estimator requires that, in absence of the treatment, the average outcomes for the treated and control groups would have followed parallel paths over time. This assumption may be implausible if pretreatment characteristics that are thought to be associated with the dynamics of the outcome variable are unbalanced between the treated and the untreated. That would be the case, for example, if selection for treatment is influenced by individualtransitory shocks on past outcomes (Ashenfelter’s Dip). This paper considers the case in which differences in observed characteristics create nonparallel outcome dynamics between treated and controls. It is shown that, in such case, a simple twostep strategy can be used to estimate the average effect of the treatment for the treated. In addition, the estimation framework proposed in this paper allows the use of covariates to describe how the average effect of the treatment varies with changes in observed characteristics.
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 140 (6 self)
 Add to MetaCart
(Show Context)
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
"Balls into bins”  A simple and tight analysis
 LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... Suppose we sequentially throw m balls into n bins. It is a natural question to ask for the maximum number of balls in any bin. In this paper we shall derive sharp upper and lower bounds which are reached with high probability. We prove bounds for all values of m(n) >= n/polylog(n) by using the s ..."
Abstract

Cited by 118 (2 self)
 Add to MetaCart
(Show Context)
Suppose we sequentially throw m balls into n bins. It is a natural question to ask for the maximum number of balls in any bin. In this paper we shall derive sharp upper and lower bounds which are reached with high probability. We prove bounds for all values of m(n) >= n/polylog(n) by using the simple and wellknown method of the first and second moment.
Tail Bounds for Occupancy and the Satisfiability Threshold Conjecture
, 1995
"... The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and prob ..."
Abstract

Cited by 111 (2 self)
 Add to MetaCart
The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and probabilistic analysis. Our motivating application is the following wellknown conjecture on threshold phenomenon for the satisfiability problem. Consider random 3SAT formulas with cn clauses over n variables, where each clause is chosen uniformly and independently from the space of all clauses of size 3. It has been conjectured that there is a sharp threshold for satisfiability at c ß 4:2. We provide a strong upper bound on the value of c , showing that for c ? 4:758 a random 3SAT formula is unsatisfiable with high probability. This result is based on a structural property, possibly of independent interest, whose proof needs several applications of the occupancy tail bounds. Supporte...
Functional Limit Theorems For Multitype Branching Processes And Generalized Pólya Urns
 APPL
, 2004
"... A functional limit theorem is proved for multitype continuous time Markov branching processes. As consequences, we obtain limit theorems for the branching process stopped by some stopping rule, for example when the total number of particles reaches a given level. Using the ..."
Abstract

Cited by 110 (18 self)
 Add to MetaCart
A functional limit theorem is proved for multitype continuous time Markov branching processes. As consequences, we obtain limit theorems for the branching process stopped by some stopping rule, for example when the total number of particles reaches a given level. Using the
A lineartime probabilistic counting algorithm for database applications
 ACM Transactions on Database Systems
, 1990
"... We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a smal ..."
Abstract

Cited by 102 (5 self)
 Add to MetaCart
(Show Context)
We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a small amount of space. Traditionally, accurate counts of unique values were obtained by sorting, which has O(q log q) time complexity. Our technique, called linear counting, is based on hashing. We present a comprehensive theoretical and experimental analysis of linear counting. The analysis reveals an interesting result: A load factor (number of unique values/hash table size) much larger than 1.0 (e.g., 12) can be used for accurate estimation (e.g., 1 % of error). We present this technique with two important applications to database problems: namely, (1) obtaining the column cardinality (the number of unique values in a column of a relation) and (2) obtaining the join selectivity (the number of unique values in the join column resulting from an unconditional join divided by the number of unique join column values in the relation to he joined). These two parameters are important statistics that are used in relational query optimization and physical database design.