Results 1 - 10
of
11
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate balls-into-bins processes allocating m balls into n bins based on the multiple-choice paradigm. In the classical single-choice variant each ball is placed into a bin selected uniformly at random. In a multiple-choice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
We investigate balls-into-bins processes allocating m balls into n bins based on the multiple-choice paradigm. In the classical single-choice variant each ball is placed into a bin selected uniformly at random. In a multiple-choice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiple-choice processes in the heavily loaded case were obtained using majorization by the single-choice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiple-choice processes are fundamentally different from the single-choice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiple-choice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the single-choice process. In particular, we investigate the allocation obtained by two different multiple-choice allocation schemes,
Network Processor Load Balancing for High-Speed Links
, 2002
"... While transmission rates already achieve speeds beyond 40 Gb/s, today's network processors are only slowly approaching 10 Gb/s. In this paper we present a load-balancing scheme that enables system designers to bridge the performance gap using multiple slower NPs in parallel to serve high-speed links ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
While transmission rates already achieve speeds beyond 40 Gb/s, today's network processors are only slowly approaching 10 Gb/s. In this paper we present a load-balancing scheme that enables system designers to bridge the performance gap using multiple slower NPs in parallel to serve high-speed links. The proposed scheme works in a flowpreserving manner to ensure in-sequence packet delivery as well as local validity of connection state information, while avoiding inter-processor communication. The effectiveness of the algorithms is evaluated by simulation with extrapolated workloads, and the impact of specific parameters on system performance is the subject of a factor-relevance analysis.
The natural work-stealing algorithm is stable
- In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-alloca ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-allocation functions. During each time-step of our process, a generator-allocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unit-time task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the work-generation model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural work-stealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Enhancing the effective utilisation of Grid clusters by exploiting On-Line Performability Analysis
"... In Grid applications the heterogeneity and potential failures of the computing infrastructure poses significant challenges to efficient scheduling. Performance models have been shown to be useful in providing predictions on which schedules can be based [1, 2] and most such techniques can also take a ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
In Grid applications the heterogeneity and potential failures of the computing infrastructure poses significant challenges to efficient scheduling. Performance models have been shown to be useful in providing predictions on which schedules can be based [1, 2] and most such techniques can also take account of failures and degraded service. However, when several alternative schedules are to be compared it is vital that the analysis of the models does not become so costly as to outweigh the potential gain of choosing the best schedule. Moreover, it is vital that the modelling approach can scale to match the size and complexity of realistic applications. In this
Load Balancing in Arbitrary Network Topologies with Stochastic Adversarial Input
- SIAM Journal on Computing
, 2005
"... We study the long-term (steady state) performance of a simple, randomized, local load balancing technique under a broad range of input conditions. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversa ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We study the long-term (steady state) performance of a simple, randomized, local load balancing technique under a broad range of input conditions. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and past load distribution in the network and can use this information to place the new tasks in the processors. A node can execute one job per step, and can also participate in one load balancing operation in which it can move tasks to a direct neighbor in the network. In the protocol we analyze here, a node equalizes its load with a random neighbor in the graph.
Stability and Efficiency of a Random Local Load Balancing Protocol
- In Proceedings FOCS
, 2003
"... We study the long term (steady state) performance of a simple, randomized, local load balancing technique. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We study the long term (steady state) performance of a simple, randomized, local load balancing technique. We assume a system of n processors connected by an arbitrary network topology. Jobs are placed in the processors by a deterministic or randomized adversary. The adversary knows the current and past load distribution in the network and can use this information to place the new tasks in the processors. The adversary can put a number of new jobs in each processor, in each step, as long as the (expected) total number of new jobs arriving at a given step is bounded by #n. A node can execute one job per step, and also participate in one load balancing operation in which it can move tasks to a direct neighbor in the network. In the protocol we analyze here, a node equalizes its load with a random neighbor in the graph.
Scalable Work Stealing ∗
"... Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challe ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed dynamic load balancing systems. Work stealing is a popular approach to distributed dynamic load balancing; however its performance on large-scale clusters is not well understood. Prior work on work stealing has largely focused on shared memory machines. In this work we investigate the design and scalability of work stealing on modern distributed memory systems. We demonstrate high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producer-consumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.
Dynamic Load Balancing Issues In The Earth Runtime System
, 1999
"... Multithreading is a promising approach to address the problems inherent in multiprocessor systems, such as network and synchronization latencies. Moreover, the benefits of multithreading are not limited to loop-based algorithms but apply also to irregular parallelism. EARTH - Efficient Architecture ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Multithreading is a promising approach to address the problems inherent in multiprocessor systems, such as network and synchronization latencies. Moreover, the benefits of multithreading are not limited to loop-based algorithms but apply also to irregular parallelism. EARTH - Efficient Architecture for Running THreads, is a multithreaded model supporting fine-grain, non-preemptive threads. This model is supported by a C-based runtime system which provides the multithreaded environment for the execution of concurrent programs. This thesis describes the design and implementation of a set of dynamic load balancing algorithms, and an in-depth study of their behavior with divide-and-conquer, regular, and irregular classes of applications. The results described in this thesis are based on EARTH-SP2, an implementation of the EARTH program execution model on the IBM SP-2, a distributed memory multiprocessor system. The main results of this study are as follows: ffl A randomizing load balance...
Asynchronous Random Polling Dynamic Load Balancing
- In Proceedings of ISAAC’99
, 1999
"... Many applications in parallel processing have to traverse large, implicitly defined trees with irregular shape. The receiver initiated load balancing algorithm random polling has long been known to be very efficient for these problems in practice. For any ffl ? 0, we prove that its parallel executio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many applications in parallel processing have to traverse large, implicitly defined trees with irregular shape. The receiver initiated load balancing algorithm random polling has long been known to be very efficient for these problems in practice. For any ffl ? 0, we prove that its parallel execution time is at most (1 + ffl)T seq =P + O(Tatomic + h( ffl +Trout +T split )) with high probability, where Trout , T split and Tatomic bound the time for sending a message, splitting a subproblem and finishing a small unsplittable subproblem respectively. The maximum splitting depth h is related to the depth of the computation tree. Previous work did not prove efficiency close to one and used less accurate models. In particular, our machine model allows asynchronous communication with nonconstant message delays and does not assume that communication takes place in rounds. This model is compatible with the LogP model.
Miscellaneous General Terms Algorithms, Theory
"... This paper introduces work-dealing, a new algorithm for ”locality oriented ” load distribution on small scale shared memory multi-processors. Its key feature is an unprecedented low overhead mechanism (only a couple of loads and stores per operation, and no costly compare-and-swaps) for dealingout w ..."
Abstract
- Add to MetaCart
This paper introduces work-dealing, a new algorithm for ”locality oriented ” load distribution on small scale shared memory multi-processors. Its key feature is an unprecedented low overhead mechanism (only a couple of loads and stores per operation, and no costly compare-and-swaps) for dealingout work to processors in a globally balanced way. We believe that for applications in which work-items have process affinity, especially applications running in dedicated mode (”stand alone”), work-dealing could prove a worthy alternative to the popular work-stealing paradigm.

