Results 1  10
of
193
The Power of Two Choices in Randomized Load Balancing
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract

Cited by 201 (23 self)
 Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract

Cited by 154 (14 self)
 Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Singlelevel scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Cuckoo hashing
 Journal of Algorithms
, 2001
"... We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that ..."
Abstract

Cited by 124 (6 self)
 Add to MetaCart
We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee. Key Words: data structures, dictionaries, information retrieval, searching, hashing, experiments * Partially supported by the Future and Emerging Technologies programme of the EU
GraphTheoretic Analysis of Structured PeertoPeer Systems: Routing Distances and Fault Resilience
, 2003
"... This paper examines graphtheoretic properties of existing peertopeer architectures and proposes a new infrastructure based on optimaldiameter de Bruijn graphs. Since generalized de Bruijn graphs possess very short average routing distances and high resilience to node failure, they are well suite ..."
Abstract

Cited by 105 (7 self)
 Add to MetaCart
This paper examines graphtheoretic properties of existing peertopeer architectures and proposes a new infrastructure based on optimaldiameter de Bruijn graphs. Since generalized de Bruijn graphs possess very short average routing distances and high resilience to node failure, they are well suited for structured peertopeer networks. Using the example of Chord, CAN, and de Bruijn, we first study routing performance, graph expansion, and clustering properties of each graph. We then examine bisection width, path overlap, and several other properties that affect routing and resilience of peertopeer networks. Having confirmed that de Bruijn graphs offer the best diameter and highest connectivity among the existing peertopeer structures, we offer a very simple incremental building process that preserves optimal properties of de Bruijn graphs under uniform user joins/departures. We call the combined peertopeer architecture
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 99 (2 self)
 Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Simple Load Balancing for Distributed Hash Tables
, 2002
"... Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argu ..."
Abstract

Cited by 98 (1 self)
 Add to MetaCart
Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argue in this paper that these goals can be achieved more simply and more costeffectively. First, we suggest the direct application of the "power of two choices" paradigm, whereby an item is stored at the less loaded of two (or more) random alterna tives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing methods, including loadstealing or loadshedding schemes, as well as providing natural faulttolerance mechanisms.
How Asymmetry Helps Load Balancing
 Journal of the ACM
, 1999
"... This paper deals with balls and bins processes related to randomized load balancing, dynamic resource allocation, and hashing. Suppose ¡ balls have to be assigned to ¡ bins, where each ball has to be placed without knowledge about the distribution of previously placed balls. The goal is to achieve a ..."
Abstract

Cited by 82 (4 self)
 Add to MetaCart
This paper deals with balls and bins processes related to randomized load balancing, dynamic resource allocation, and hashing. Suppose ¡ balls have to be assigned to ¡ bins, where each ball has to be placed without knowledge about the distribution of previously placed balls. The goal is to achieve an allocation that is as even as possible so that no bin gets much more balls than the average. A well known and good solution for this problem is to choose ¢ possible locations for each ball at random, to look into each of these bins, and to place the ball into the least full among these bins. This class of algorithms has been investigated intensively in the past, but almost all previous analyses assume that the ¢ locations for each ball are chosen uniformly and independently at random from the set of all bins. We investigate whether a nonuniform and possibly dependent choice of the ¢
How Useful Is Old Information
 IEEE Transactions on Parallel and Distributed Systems
, 2000
"... AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the ..."
Abstract

Cited by 80 (10 self)
 Add to MetaCart
AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the tasks must choose a processor for service, and a task knows when making this choice the processor queue lengths from T seconds ago. What is a good strategy for choosing a processor in order for tasks to minimize their expected time in the system? Such models can also be used to describe settings where there is a transfer delay between the time a task enters a system and the time it reaches a processor for service. Our models are based on considering the behavior of limiting systems where the number of processors goes to infinity. The limiting systems can be shown to accurately describe the behavior of sufficiently large systems and simulations demonstrate that they are reasonably accurate even for systems with a small number of processors. Our studies of specific models demonstrate the importance of using randomness to break symmetry in these systems and yield important rules of thumb for system design. The most significant result is that only small amounts of queue length information can be extremely useful in these settings; for example, having incoming tasks choose the least loaded of two randomly chosen processors is extremely effective over a large range of possible system parameters. In contrast, using global information can actually degrade performance unless used carefully; for example, unlike most settings where the load information is current, having tasks go to the apparently least loaded server can significantly hurt performance. Index TermsÐLoad balancing, stale information, old information, queuing theory, large deviations. æ 1
Using Multiple Hash Functions to Improve IP Lookups
 IN PROCEEDINGS OF IEEE INFOCOM
, 2000
"... High performance Internet routers require a mechanism for very efficient IP address lookups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash ..."
Abstract

Cited by 68 (11 self)
 Add to MetaCart
High performance Internet routers require a mechanism for very efficient IP address lookups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. In this paper we describe an approach for obtaining good hash tables based on using multiple hashes of each input key (which is an IP address). The methods we describe are fast, simple, scalable, parallelizable, and flexible. In particular, in instances where the goal is to have one hash bucket fit into a cache line, using multiple hashes proves extremely suitable. We provide a general analysis of this hashing technique and specifically discuss its application to binary search on levels.