Results 1  10
of
20
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 139 (6 self)
 Add to MetaCart
(Show Context)
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Efficient hashing with lookups in two memory accesses, in: 16th
 SODA, ACMSIAM
"... The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the c ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the concept of twoway hashing where the largest bucket contains O(log log n) balls with high probability. The hash look up will now search in both the buckets an item hashes to. Since an item may be placed in one of two buckets, we could potentially move an item after it has been initially placed to reduce maximum load. Using this fact, we present a simple, practical hashing scheme that maintains a maximum load of 2, with high probability, while achieving high memory utilization. In fact, with n buckets, even if the space for two items are preallocated per bucket, as may be desirable in hardware implementations, more than n items can be stored giving a high memory utilization. Assuming truly random hash functions, we prove the following properties for our hashing scheme. • Each lookup takes two random memory accesses, and reads at most two items per access. • Each insert takes O(log n) time and up to log log n+ O(1) moves, with high probability, and constant time in expectation. • Maintains 83.75 % memory utilization, without requiring dynamic allocation during inserts. We also analyze the tradeoff between the number of moves performed during inserts and the maximum load on a bucket. By performing at most h moves, we can maintain a maximum load of O(hlogl((~og~og:n/h)). So, even by performing one move, we achieve a better bound than by performing no moves at all. 1
Balanced Allocation on Graphs
 In Proc. 7th Symposium on Discrete Algorithms (SODA
, 2006
"... It is well known that if n balls are inserted into n bins, with high probability, the bin with maximum load contains (1 + o(1))log n / loglog n balls. Azar, Broder, Karlin, and Upfal [1] showed that instead of choosing one bin, if d ≥ 2 bins are chosen at random and the ball inserted into the least ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
It is well known that if n balls are inserted into n bins, with high probability, the bin with maximum load contains (1 + o(1))log n / loglog n balls. Azar, Broder, Karlin, and Upfal [1] showed that instead of choosing one bin, if d ≥ 2 bins are chosen at random and the ball inserted into the least loaded of the d bins, the maximum load reduces drastically to log log n / log d+O(1). In this paper, we study the two choice balls and bins process when balls are not allowed to choose any two random bins, but only bins that are connected by an edge in an underlying graph. We show that for n balls and n bins, if the graph is almost regular with degree n ǫ, where ǫ is not too small, the previous bounds on the maximum load continue to hold. Precisely, the maximum load is
CONTENTION RESOLUTION IN HASHING BASED SHARED MEMORY SIMULATIONS
, 2000
"... In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at th ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at the memory modules. Extending results and methods from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. In particular, we show that an nprocessor CRCW PRAM can be simulated by an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. Next we describe a general technique that can be used to turn these simulations into timeprocessor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a timeprocessor optimal simulation of an (n log log log n log ∗ n)processor EREW PRAM on an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. When an (n log log log n log ∗ n)processor CRCW PRAM is simulated, the delay is only by a log ∗ n factor larger. We further demonstrate that the simulations presented can not be significantly improved using our techniques. We show an Ω(log log log n / log log log log n) lower bound on the expected delay for a class of PRAM simulations, called topological simulations, that covers all previously known simulations as well as the simulations presented in the paper.
On the effectiveness of DBSP as a bridging model of parallel computation
 IN PROC. OF THE INT. CONFERENCE ON COMPUTATIONAL SCIENCE, LNCS 2074
, 2001
"... This paper surveys and places into perspective a number of results concerning the DBSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP model proposed by Valiant in the early nineties. DBSP captures part of the proximity structure of the computing platfor ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
This paper surveys and places into perspective a number of results concerning the DBSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP model proposed by Valiant in the early nineties. DBSP captures part of the proximity structure of the computing platform, modeling it by suitable decompositions into clusters, each characterized by its own bandwidth and latency parameters. Quantitative evidence is provided that, when modeling realistic parallel architectures, DBSP achieves higher effectiveness and portability than BSP, without significantly affecting the ease of use. It is also shown that DBSP avoids some of the shortcomings of BSP which motivated the definition of other variants of the model. Finally, the paper discusses how the aspects of network proximity incorporated in the model allow for a better management of network congestion and bank contention, when supporting a sharedmemory abstraction in a distributedmemory environment. 1
SharedMemory Simulations on a FaultyMemory DMM
, 1996
"... this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to obtain an access to any MU, and, having been granted it, may access any memory word in it, even if all of them are faulty. The only restriction on the distribution of faults among memory words is that their total number is bounded from above by a fraction of the total number of memory words in all the MUs. In particular, some MUs may contain only operational cells, some only faulty cells, and some mixed cells. This report presents fast simulations of the PRAM on a DMM with faulty memory.
Simulating shared memory in real time: On the computation power of reconfigurable meshes
 in ``Proceedings of the 2nd IEEE Workshop on Reconfigurable Architectures
, 1995
"... We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reco ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reconfigurable interconnection. It guarantees delay O(log *n), with high probability. Next we study a reconfigurable mesh DMM (RMDMM). Here the n processors and n modules are connected via an n_n reconfigurable mesh. It was already known that an n_m reconfigurable mesh can simulate in constant time an nprocessor CRCW PRAM with shared memory of size m. In this paper we present a randomized step by step simulation of a CRCW PRAM with arbitrarily large shared memory on an RMDMM. It guarantees constant delay with high probability, i.e., it simulates in real time. Finally we prove a lower bound showing that size 0(n 2) for the reconfigurable mesh is necessary for real time simulations.] 1997 Academic Press * Supported by DFGGraduiertenkolleg ``Parallele Rechnernetzwerke in der Produktionstechnik,''
The Complexity of Deterministic PRAM Simulation on Distributed Memory Machines
, 1997
"... In this paper we present lower and upper bounds for the deterministic simulation of a Parallel Random Access Machine (PRAM) with n processors and m variables on a Distributed Memory Machine (DMM) with p n processors. The bounds are expressed as a function of the redundancy r of the scheme (i.e., th ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
In this paper we present lower and upper bounds for the deterministic simulation of a Parallel Random Access Machine (PRAM) with n processors and m variables on a Distributed Memory Machine (DMM) with p n processors. The bounds are expressed as a function of the redundancy r of the scheme (i.e., the number of copies used to represent each PRAM variable in the DMM), and become tight for any m polynomial in n and r = \Theta (1).
Constructive, Deterministic Implementation of Shared Memory on Meshes
 SIAM Journal on Computing
"... . This paper describes a scheme to implement a shared address space of size m on an nnode mesh, with m polynomial in n, where each mesh node hosts a processor and a memory module. At the core of the simulation is a Hierarchical Memory Organization Scheme (HMOS), which governs the distribution of th ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
. This paper describes a scheme to implement a shared address space of size m on an nnode mesh, with m polynomial in n, where each mesh node hosts a processor and a memory module. At the core of the simulation is a Hierarchical Memory Organization Scheme (HMOS), which governs the distribution of the shared variables, each replicated into multiple copies, among the memory modules, through a cascade of bipartite graphs. Based on the expansion properties of such graphs, we devise a protocol that accesses any ntuple of shared variables in worstcase time O \Gamma n 1=2+j \Delta , for any constant j ? 0, using O \Gamma 1=j 1:59 \Delta copies per variable, or in worstcase time O \Gamma n 1=2 log n \Delta , using O \Gamma log 1:59 n \Delta copies per variable. In both cases the access time is close to the natural O \Gamma p n \Delta lower bound imposed by the network diameter. A key feature of the scheme is that it can be made fully constructive when m is not too ...