Results 1 - 10
of
88
The Power of Two Choices in Randomized Load Balancing
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract
-
Cited by 159 (22 self)
- Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
Tail Bounds for Occupancy and the Satisfiability Threshold Conjecture
, 1995
"... The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and prob ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and probabilistic analysis. Our motivating application is the following well-known conjecture on threshold phenomenon for the satisfiability problem. Consider random 3-SAT formulas with cn clauses over n variables, where each clause is chosen uniformly and independently from the space of all clauses of size 3. It has been conjectured that there is a sharp threshold for satisfiability at c ß 4:2. We provide a strong upper bound on the value of c , showing that for c ? 4:758 a random 3-SAT formula is unsatisfiable with high probability. This result is based on a structural property, possibly of independent interest, whose proof needs several applications of the occupancy tail bounds. Supporte...
The Power of Two Random Choices: A Survey of Techniques and Results
- in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
A linear-time probabilistic counting algorithm for database applications
- ACM Transactions on Database Systems
, 1990
"... We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a smal ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a small amount of space. Traditionally, accurate counts of unique values were obtained by sorting, which has O(q log q) time complexity. Our technique, called linear counting, is based on hashing. We present a comprehensive theoretical and experimental analysis of linear counting. The analysis reveals an interesting result: A load factor (number of unique values/hash table size) much larger than 1.0 (e.g., 12) can be used for accurate estimation (e.g., 1 % of error). We present this technique with two important applications to database problems: namely, (1) obtaining the column cardinality (the number of unique values in a column of a relation) and (2) obtaining the join selectivity (the number of unique values in the join column resulting from an unconditional join divided by the number of unique join column values in the relation to he joined). These two parameters are important statistics that are used in relational query optimization and physical database design.
"Balls into Bins" - A Simple and Tight Analysis
, 1998
"... . Suppose we sequentially throw m balls into n bins. It is a natural question to ask for the maximum number of balls in any bin. In this paper we shall derive sharp upper and lower bounds which are reached with high probability. We prove bounds for all values of m(n) n=polylog(n) by using the simp ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
. Suppose we sequentially throw m balls into n bins. It is a natural question to ask for the maximum number of balls in any bin. In this paper we shall derive sharp upper and lower bounds which are reached with high probability. We prove bounds for all values of m(n) n=polylog(n) by using the simple and well-known method of the first and second moment. 1 Introduction Suppose that we sequentially throw m balls into n bins by placing each ball into a bin chosen independently and uniformly at random. It is a natural question to ask for the maximum number of balls in any bin. This very simple model has many applications in computer science. Here we name just two of them: Hashing: The balls-into-bins model may be used to analyze the efficiency of hashing-algorithms. In the case of the so called separate chaining, all keys that hash to the same location in the table are stored in a linked list. It is clear that the lengths of these lists are a measure for the complexity. For a well chose...
Parallel Randomized Load Balancing
- In Symposium on Theory of Computing. ACM
, 1995
"... It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the least full of its chosen bins [2]. They show that the fullest bin contains only log log n= log d + \Theta(1) balls with high probability. We explore extensions of this result to parallel and distributed settings. Our results focus on the tradeoff between the amount of communication and the final load. Given r rounds of communication, we provide lower bounds on the maximum load of \Omega\Gamma r p log n= log log n) for a wide class of strategies. Our results extend to the case where the number of rounds is allowed to grow with n. We then demonstrate parallelizations of the sequential strategy presented in Azar et al. that achieve loads within a constant factor of the lower bound for two ...
Functional Limit Theorems For Multitype Branching Processes And Generalized Pólya Urns
- APPL
, 2004
"... A functional limit theorem is proved for multitype continuous time Markov branching processes. As consequences, we obtain limit theorems for the branching process stopped by some stopping rule, for example when the total number of particles reaches a given level. Using the ..."
Abstract
-
Cited by 50 (12 self)
- Add to MetaCart
A functional limit theorem is proved for multitype continuous time Markov branching processes. As consequences, we obtain limit theorems for the branching process stopped by some stopping rule, for example when the total number of particles reaches a given level. Using the
Multiple Object Identification with Passive RFID Tags
, 2002
"... We investigate the applicability of passive RFID systems to the task of identifying multiple tagged objects simultaneously, assuming that the number of tags is not known in advance. We present a combinatorial model of the communication mechanism between the reader device and the tags, and use this m ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
We investigate the applicability of passive RFID systems to the task of identifying multiple tagged objects simultaneously, assuming that the number of tags is not known in advance. We present a combinatorial model of the communication mechanism between the reader device and the tags, and use this model to derive the optimal parameter setting for the reading process, based on estimates for the number of tags. Some results on the performance of an implementation are presented.
Optimizing Result Prefetching in Web Search Engines with Segmented Indices
- In VLDB
, 2001
"... We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for qu ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for queries which they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy which abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources as well. We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant which optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing of short queries in global index architectures.

