Results 1 - 10
of
30
The Power of Two Choices in Randomized Load Balancing
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract
-
Cited by 159 (22 self)
- Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
Parallel Randomized Load Balancing
- In Symposium on Theory of Computing. ACM
, 1995
"... It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the least full of its chosen bins [2]. They show that the fullest bin contains only log log n= log d + \Theta(1) balls with high probability. We explore extensions of this result to parallel and distributed settings. Our results focus on the tradeoff between the amount of communication and the final load. Given r rounds of communication, we provide lower bounds on the maximum load of \Omega\Gamma r p log n= log log n) for a wide class of strategies. Our results extend to the case where the number of rounds is allowed to grow with n. We then demonstrate parallelizations of the sequential strategy presented in Azar et al. that achieve loads within a constant factor of the lower bound for two ...
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
On Contention Resolution Protocols and Associated Probabilistic Phenomena
- IN PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... ..."
Contention Resolution with Constant Expected Delay
"... We study contention resolution problem in a multiple-access channel such as the Ethernet... ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We study contention resolution problem in a multiple-access channel such as the Ethernet...
Shared Memory Simulations with Triple-Logarithmic Delay (Extended Abstract)
, 1995
"... ) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D-33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33095 Paderborn, Germany Abstract. We conside ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D-33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33095 Paderborn, Germany Abstract. We consider the problem of simulating a PRAM on a distributed memory machine (DMM). Our main result is a randomized algorithm that simulates each step of an n-processor CRCW PRAM on an n-processor DMM with O(log log log n log n) delay, with high probability. This is an exponential improvement on all previously known simulations. It can be extended to a simulation of an (n log log log n log n)- processor EREW PRAM on an n-processor DMM with optimal delay O(log log log n log n), with high probability. Finally a lower bound of \Omega (log log log n=log log log log n) expected time is proved for a large class of randomized simulations that includes all known simulations. 1 Introduction Para...
Parallel Balanced Allocations
- IN PROCEEDINGS OF THE 8TH ANNUAL ACM SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1996
"... We study the well known problem of throwing m balls into n bins. If each ball in the sequential game is allowed to select more than one bin, the maximum load of the bins can be exponentially reduced compared to the `classical balls into bins' game. We consider a static and a dynamic variant of a ra ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We study the well known problem of throwing m balls into n bins. If each ball in the sequential game is allowed to select more than one bin, the maximum load of the bins can be exponentially reduced compared to the `classical balls into bins' game. We consider a static and a dynamic variant of a randomized parallel allocation where each ball can choose a constant number of bins. All results hold with high probability. In the static case all m balls arrive at the same time. We analyze for m = n a very simple optimal class of protocols achieving maximum load O i r q log n log log n j if r rounds of communication are allowed. This matches the lower bound of [ACMR95]. Furthermore, we generalize the protocols to the case of m ? n balls. An optimal load of O(m=n) can be achieved using log log n log(m=n) rounds of communication. Hence, for m = n log log n log log log n balls this slackness allows to hide the amount of communication. In the `classical balls into bins' game this op...
Randomized Protocols for Low-Congestion Circuit Routing in Multistage Interconnection Networks
- IN PROC. OF THE 30TH ACM SYMP. ON THEORY OF COMPUTING (STOC
, 1998
"... In this paper we study randomized algorithms for circuit switching on multistage networks related to the butterfly. We devise algorithms that route messages by constructing circuits (or paths) for the messages with small congestion, dilation, and setup time. Our algorithms are based on the idea of h ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In this paper we study randomized algorithms for circuit switching on multistage networks related to the butterfly. We devise algorithms that route messages by constructing circuits (or paths) for the messages with small congestion, dilation, and setup time. Our algorithms are based on the idea of having each message choose a route from two possibilities, a technique that has previously proven successful in simpler load balancing settings. As an application of our techniques, we propose a novel design for a data server.
Constructive Deterministic PRAM Simulation on a Mesh-Connected Computer
- In Proc. 6th ACM Symp. on Parallel Algorithms and Architectures
, 1993
"... The PRAM model of computation consists of a collection of sequential RAM machines accessing a shared memory in lock-step fashion. The PRAM is a very high-level abstraction of a parallel computer, and its direct realization in hardware is beyond reach of the current (or even foreseeable) technology. ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
The PRAM model of computation consists of a collection of sequential RAM machines accessing a shared memory in lock-step fashion. The PRAM is a very high-level abstraction of a parallel computer, and its direct realization in hardware is beyond reach of the current (or even foreseeable) technology. In this paper we present a deterministic simulation scheme to emulate PRAM computation on a mesh-connected computer, a feasible machine where each processor has its own memory module and is connected to at most four other processors via point-to-point links. In order to achieve a good worst-case performance, any deterministic simulation scheme has to replicate each variable in a number of copies. Such copies are stored in the local memory modules according to a Memory Organization Scheme (MOS), which is known to all the processors. A variable is then accessed by routing packets to its copies. All deterministic schemes in the literature make use of a MOS whose existence is proved via the prob...

