Results 1  10
of
19
Stability of load balancing algorithms in dynamic adversarial systems
 In Proc. of the 34th ACM Symp. on Theory of Computing (STOC
, 2002
"... Abstract. In the dynamic load balancing problem, we seek to keep the job load roughly evenly distributed among the processors of a given network. The arrival and departure of jobs is modeled by an adversary restricted in its power. Muthukrishnan and Rajaraman (1998) gave a clean characterization of ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract. In the dynamic load balancing problem, we seek to keep the job load roughly evenly distributed among the processors of a given network. The arrival and departure of jobs is modeled by an adversary restricted in its power. Muthukrishnan and Rajaraman (1998) gave a clean characterization of a restriction on the adversary that can be considered the natural analogue of a cut condition. They proved that a simple local balancing algorithm proposed by Aiello et. al. (1993) is stable against such an adversary if the insertion rate is restricted to a (1 − ε) fraction of the cut size. They left as an open question whether the algorithm is stable at rate 1. In this paper, we resolve this question positively, by proving stability of the local algorithm at rate 1. Our proof techniques are very different from the ones used by Muthukrishnan and Rajaraman, and yield a simpler proof and tighter bounds on the difference in loads. In addition, we introduce a multicommodity version of this load balancing model, and show how to extend the result to the case of balancing two different kinds of loads at once (obtaining as a corollary a new proof of the 2commodity MaxFlow MinCut Theorem). We also show how to apply the proof techniques to the problem of routing packets in adversarial systems. Awerbuch et. al. (2001) showed that the same load balancing algorithm is stable against an adversary inserting
Scalable Work Stealing ∗
"... Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on largescale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challe ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on largescale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed dynamic load balancing systems. Work stealing is a popular approach to distributed dynamic load balancing; however its performance on largescale clusters is not well understood. Prior work on work stealing has largely focused on shared memory machines. In this work we investigate the design and scalability of work stealing on modern distributed memory systems. We demonstrate high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producerconsumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.
Load Balancing in Arbitrary Network Topologies with Stochastic Adversarial Input
 SIAM Journal on Computing
, 2005
"... We study the longterm (steady state) performance of a simple, randomized, local load balancing technique under a broad range of input conditions. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversa ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We study the longterm (steady state) performance of a simple, randomized, local load balancing technique under a broad range of input conditions. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and past load distribution in the network and can use this information to place the new tasks in the processors. A node can execute one job per step, and can also participate in one load balancing operation in which it can move tasks to a direct neighbor in the network. In the protocol we analyze here, a node equalizes its load with a random neighbor in the graph.
Stability and Efficiency of a Random Local Load Balancing Protocol
 In Proceedings FOCS
, 2003
"... We study the long term (steady state) performance of a simple, randomized, local load balancing technique. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We study the long term (steady state) performance of a simple, randomized, local load balancing technique. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and past load distribution in the network and can use this information to place the new tasks in the processors. The adversary can put a number of new jobs in each processor, in each step, as long as the (expected) total number of new jobs arriving at a given step is bounded by #n. A node can execute one job per step, and also participate in one load balancing operation in which it can move tasks to a direct neighbor in the network. In the protocol we analyze here, a node equalizes its load with a random neighbor in the graph.
Load Balancing: Toward the Infinite Network
, 2006
"... We present a contribution on dynamic load balancing for distributed and parallel objectoriented applications. We specially target peertopeer systems and their capability to distribute parallel computation. Using an algorithm for activeobject load balancing, we simulate the balance of a parallel ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a contribution on dynamic load balancing for distributed and parallel objectoriented applications. We specially target peertopeer systems and their capability to distribute parallel computation. Using an algorithm for activeobject load balancing, we simulate the balance of a parallel application over a peertopeer infrastructure. We tune the algorithm parameters in order to obtain the best performance, concluding that our IFL algorithm behaves very well and scales to large peertopeer networks (around 8,000 nodes). This work was accepted on the 12th Workshop on Job Scheduling Strategies for Parallel Processing. June 26,
Dynamic Diffusion Load Balancing
 in Proc. 32nd Intl. Colloq. on Automata, Languages and Programming
"... We consider the problem of dynamic load balancing in arbitrary (connected) networks on n nodes. Our load generation model is such that during each round, n tasks are generated on arbitrary nodes, and then (possibly after some balancing) one task is deleted from every nonempty node. Notice that this ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the problem of dynamic load balancing in arbitrary (connected) networks on n nodes. Our load generation model is such that during each round, n tasks are generated on arbitrary nodes, and then (possibly after some balancing) one task is deleted from every nonempty node. Notice that this model fully saturates the resources of the network in the sense that we generate just as many new tasks per round as the network is able to delete. We show that even in this situation the system is stable, in that the total load remains bounded (as a function of n alone) over time. Our proof only requires that the underlying “communication ” graph be connected. (It of course also works if we generate less than n new tasks per round, but the major contribution of this paper is the fully saturated case.) We further show that the upper bound we obtain is asymptotically tight (up to a moderate multiplicative constant) by demonstrating a corresponding lower bound on the system load for the particular example of a linear array (or path). We also show some simple negative results (i.e., instability) for workstealing based diffusiontype algorithms in this setting.
EXPERIENCES WITH MESHLIKE COMPUTATIONS USING PREDICTION BINARY TREES ∗
"... Abstract. In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a loadbalancing technique in meshlike computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a loadbalancing technique in meshlike computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as nextphase’s cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces dependency among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almostsquare shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce interprocessors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of datalocality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy. Key words: scheduling, load balancing, performance prediction, meshlike computation, performance evaluation. 1. Introduction.
Algorithms, Theory
"... This paper studies a load balancing game introduced by Koutsoupias and Papadimitriou, that is intended to model a set of users who share several internetbased resources. Some of the recent work on this topic has considered the problem of constructing Nash equilibria, which are choices of actions wh ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper studies a load balancing game introduced by Koutsoupias and Papadimitriou, that is intended to model a set of users who share several internetbased resources. Some of the recent work on this topic has considered the problem of constructing Nash equilibria, which are choices of actions where each user has optimal utility given the actions of the other users. A related (harder) problem is to find sequences of utilityimproving moves that lead to a Nash equilibrium, starting from some given assignment of resources to users. We consider the special case where all resources are the same as each other. It is known already that there exist efficient algorithms for finding Nash equilibria; our contribution here is to show furthermore that Nash equilibria for this type of game are reached rapidly by Randomized Local Search, a simple generic method for local optimization. Our motivation for studying Randomized Local Search is that (as we show) it can be realised by a simple distributed network of users that act selfishly, have no central control and only interact via the effect they have on the cost functions of resources.
Scheduling parallel programs by work stealing with private deques
, 2013
"... Work stealing has proven to be an effective method for scheduling finegrained parallel programs on multicore computers. To achieve high performance, work stealing distributes tasks between concurrent queues, called deques, assigned to each processor. Each processor operates on its deque locally exc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Work stealing has proven to be an effective method for scheduling finegrained parallel programs on multicore computers. To achieve high performance, work stealing distributes tasks between concurrent queues, called deques, assigned to each processor. Each processor operates on its deque locally except when performing load balancing via steals. Unfortunately, concurrent deques suffer from two limitations: 1) local deque operations require expensive memory fences in modern weakmemory architectures, 2) they can be very difficult to extend to support various optimizations and flexible forms of task distribution strategies needed many applications, e.g., those that do not fit nicely into the divideandconquer, nested data parallel paradigm. For these reasons, there has been a lot recent interest in implementations of work stealing with nonconcurrent deques, where deques
Miscellaneous General Terms Algorithms, Theory
"... This paper introduces workdealing, a new algorithm for ”locality oriented ” load distribution on small scale shared memory multiprocessors. Its key feature is an unprecedented low overhead mechanism (only a couple of loads and stores per operation, and no costly compareandswaps) for dealingout w ..."
Abstract
 Add to MetaCart
This paper introduces workdealing, a new algorithm for ”locality oriented ” load distribution on small scale shared memory multiprocessors. Its key feature is an unprecedented low overhead mechanism (only a couple of loads and stores per operation, and no costly compareandswaps) for dealingout work to processors in a globally balanced way. We believe that for applications in which workitems have process affinity, especially applications running in dedicated mode (”stand alone”), workdealing could prove a worthy alternative to the popular workstealing paradigm.