## Scalable Load Balancing Techniques for Parallel Computers (1994)

Citations: | 103 - 16 self |

### BibTeX

@MISC{Kumar94scalableload,

author = {Vipin Kumar and Ananth Y. Grama and Vempaty Nageshwara Rao},

title = {Scalable Load Balancing Techniques for Parallel Computers},

year = {1994}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...

### Citations

762 | A machine program for theorem proving - Davis, Longemann, et al. - 1962 |

709 |
Heuristics: Intelligent Search Strategies for Computer Problem Solving
- Pearl
- 1984
(Show Context)
Citation Context ... upon the degree of load balance achieved and the overheads due to load balancing. Work created in the execution of many tree search algorithms used in artificial intelligence and operations research =-=[22, 31]-=- and many divide-and-conquer algorithms [16] satisfy all the requirements stated above. As an example, consider the problem of searching a state-space tree in depth-first fashion to find a solution. T... |

258 |
Inroduction to Artificial Intelligence
- Charniak, McDermott
- 1987
(Show Context)
Citation Context ...of workstations). For practical problems, in depth first search, it is much cheaper to incrementally build the state associated with each node rather than copy and/or create the new node from scratch =-=[39, 4]-=-. This also introduces additional inefficiency. Further, the memory requirement at a processor is potentially unbounded, as a processor may be required to store an arbitrarily large number of work pie... |

250 | Reevaluating Amdahl’s law - Gustafson - 1988 |

203 |
Computer Algorithms
- Horowitz, Sahni, et al.
- 1998
(Show Context)
Citation Context ...e overheads due to load balancing. Work created in the execution of many tree search algorithms used in artificial intelligence and operations research [22, 31] and many divide-and-conquer algorithms =-=[16]-=- satisfy all the requirements stated above. As an example, consider the problem of searching a state-space tree in depth-first fashion to find a solution. The state space tree can be easily split up i... |

118 | Development of Parallel Methods for a 1024-Processor Hypercube - Gustafson, Montry, et al. - 1988 |

105 |
Dib - a distributed implementation of backtracking
- Finkel, Manber
- 1987
(Show Context)
Citation Context ...s, and in general there is no way of estimating the size of a search tree. A number of dynamic load balancing strategies that are applicable to problems with these characteristics have been developed =-=[3, 7, 8, 10, 28, 29, 30, 33, 35, 36, 40, 41]-=-. Many of these schemes have been experimentally tested on some physical parallel architectures. From these experimental results, it is difficult to ascertain relative merits of different schemes. The... |

93 | Analyzing the scalability of parallel algorithms and architectures: A survey
- Kumar, Gupta
- 1991
(Show Context)
Citation Context ...ntly by changes in hardware characteristics (such as interconnection network, CPU speed, speed of communication channels etc.), number of processors, and the size of the problem instance being solved =-=[21]-=-. Hence any conclusions drawn on a set of experimental results are invalidated by changes in any one of the above parameters. Scalability analysis of a parallel algorithm and architecture combination ... |

76 | The scalability of FFT on parallel computers
- Gupta, Kumar
- 1993
(Show Context)
Citation Context ...ell as the parallel architecture on which it is implemented, in a single expression. The isoefficiency metric has been found to be quite useful in characterizing scalability of a number of algorithms =-=[13, 25, 34, 37, 42, 43]-=-. In particular, Kumar and Rao used isoefficiency analysis to characterize the scalability of some load balancing schemes on the shared-memory, ring and hypercube architectures[23] and validated it ex... |

49 | Parallel depth-first search, partii: analysis
- Kumar, Rao
- 1988
(Show Context)
Citation Context ...ntal results are invalidated by changes in any one of the above parameters. Scalability analysis of a parallel algorithm and architecture combination is very useful in extrapolating these conclusions =-=[14, 15, 21, 23]-=-. It may be used to select the best architecture algorithm combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predi... |

42 |
Comparing the performance of two dynamic load distribution methods
- Kalé
- 1988
(Show Context)
Citation Context ...r work from processors that have work to processors that are idle. Since none of the processors (that have work) know how much work they have, load balancing schemes which require this knowledge (eg. =-=[17, 19]-=-) are not applicable. The performance of a load balancing scheme is dependent upon the degree of load balance achieved and the overheads due to load balancing. Work created in the execution of many tr... |

38 | Unstructured tree search on simd parallel computers
- Karypis, Kumar
- 1992
(Show Context)
Citation Context ...ic computations involving the solution of partial differential equations. Dynamic load Balancing algorithms for SIMD processors are of a very different nature compared to those for MIMD architectures =-=[9, 27, 32, 18]-=-. Due to architectural constraints in SIMD machines, load balancing needs to be done on a global scale. In contrast, on MIMD machines, load can be balanced among a small subset of processors while the... |

35 | Scalability of Parallel Algorithms for the All-Pairs Shortest Path Problem: A Summary of Results
- Kumar, Singh
- 1990
(Show Context)
Citation Context ...ell as the parallel architecture on which it is implemented, in a single expression. The isoefficiency metric has been found to be quite useful in characterizing scalability of a number of algorithms =-=[13, 25, 34, 37, 42, 43]-=-. In particular, Kumar and Rao used isoefficiency analysis to characterize the scalability of some load balancing schemes on the shared-memory, ring and hypercube architectures[23] and validated it ex... |

35 | Optimal speedup for backtrack search on a butter network - Ranade - 1991 |

34 | Parallel depth-first search, part i: implementation
- Rao, Kumar
- 1988
(Show Context)
Citation Context ... The state space tree can be easily split up into many parts and each part can be assigned to a different processor. Although it is usually possible to come up with a reasonable work splitting scheme =-=[29]-=-, different parts can be of radically different sizes, and in general there is no way of estimating the size of a search tree. A number of dynamic load balancing strategies that are applicable to prob... |

30 |
Distributed tree search and its application to alpha-beta pruning
- Ferguson, Korf
- 1988
(Show Context)
Citation Context ...s, and in general there is no way of estimating the size of a search tree. A number of dynamic load balancing strategies that are applicable to problems with these characteristics have been developed =-=[3, 7, 8, 10, 28, 29, 30, 33, 35, 36, 40, 41]-=-. Many of these schemes have been experimentally tested on some physical parallel architectures. From these experimental results, it is difficult to ascertain relative merits of different schemes. The... |

29 |
Hypercube Algorithms for Image Processing and Pattern Recognition
- Ranka, Sahni
- 1990
(Show Context)
Citation Context ...ell as the parallel architecture on which it is implemented, in a single expression. The isoefficiency metric has been found to be quite useful in characterizing scalability of a number of algorithms =-=[13, 25, 34, 37, 42, 43]-=-. In particular, Kumar and Rao used isoefficiency analysis to characterize the scalability of some load balancing schemes on the shared-memory, ring and hypercube architectures[23] and validated it ex... |

27 |
A shared virtual memory system for parallel computing
- Ivy
- 1988
(Show Context)
Citation Context ...etwork of workstations provides us with a cheap and universally available platform for parallelizing applications. Several applications have been parallelized to run on a small number of workstations =-=[1, 26]-=-. For example, in [1] an implementation of parallel depth first branch and bound for VLSI floorplan optimization is presented. Linear speedups were obtained on up to 16 processors. The essential part ... |

24 |
et al. The NYU Ultracomputer - designing an MIMD shared memory parallel computer
- Gottlieb
- 1983
(Show Context)
Citation Context ... by processor 0 is greatly reduced. This technique of performing atomic increment operations on a shared variable, TARGET, 7sis essentially a software implementation of the fetch-and-add operation of =-=[6]-=-. To the best of our knowledge, GRR-M has not been used for load balancing by any other researcher. We illustrate this scheme by describing its implementation for a hypercube architecture. Figure 4.5 ... |

21 |
A multi-level load balancing scheme for or-parallel exhaustive search programs on the multi-psi
- Furuichi, Taki, et al.
- 1990
(Show Context)
Citation Context ...s, and in general there is no way of estimating the size of a search tree. A number of dynamic load balancing strategies that are applicable to problems with these characteristics have been developed =-=[3, 7, 8, 10, 28, 29, 30, 33, 35, 36, 40, 41]-=-. Many of these schemes have been experimentally tested on some physical parallel architectures. From these experimental results, it is difficult to ascertain relative merits of different schemes. The... |

21 |
Multiprocessing of combinatorial search problems
- Wah, Li, et al.
- 1985
(Show Context)
Citation Context |

21 |
Manip - a multicomputer architecture for solving combinatorial extremumsearch problems
- Wah, Ma
- 1984
(Show Context)
Citation Context |

20 |
Ecient parallel algorithms for search problems: Applications in vlsi cad
- Arvindam, Kumar, et al.
- 1990
(Show Context)
Citation Context ...t of the Satisfiability problem [5]. The Satisfiability problem consists of testing the validity of boolean formulae. Such problems arise in areas such as VLSI design and theorem proving among others =-=[2, 5]-=-. The problem is "given a boolean formula containing binary variables in disjunctive normal form, find out if it is unsatisfiable". The Davis and Putnam algorithm [5] presents a fast and efficient way... |

19 |
A dynamic scheduling strategy for the chare-kernel system
- Shu, Kale
- 1989
(Show Context)
Citation Context |

19 | Scalability of parallel sorting on mesh multicomputers
- Singh, Kumar, et al.
- 1991
(Show Context)
Citation Context |

18 |
The scalability of Matrix Multiplication Algorithms on parallel computers
- Gupta, Kumar
(Show Context)
Citation Context ...ginal problem size. This shows that the impact of changes in technology dependent factors is moderate. These can, however, be quite drastic for other algorithms such as FFT [13] and Matrix algorithms =-=[12]-=-. Being able to make such predictions is one of the significant advantages of isoefficiency analysis. Two problem characteristics, communication coupling between subtasks and the ability to estimate w... |

16 | Load balancing on the hypercube architecture - Kumar, Rao - 1989 |

16 | A parallel branch and bound algorithm for test generation
- Patil, Banerjee
- 1990
(Show Context)
Citation Context |

16 |
Consistent linear speedups to a first solution in parallel statespace search
- Saletore, Kale
- 1990
(Show Context)
Citation Context |

15 |
Experimental evaluation of load balancing techniques for the hypercube
- Grama, Kumar, et al.
- 1991
(Show Context)
Citation Context ...le work transfer cost on overall scalability. Section 8 presents experimental results. Section 9 contains summary of results and suggestions for future work. Some parts of this paper have appeared in =-=[11]-=- and [24]. 2 Definitions and Assumptions In this section, we introduce some assumptions and basic terminology necessary to understand the isoefficiency analysis. 1. Problem size W : the amount of esse... |

14 | Hypercube computing : connected components
- Woo, Sahni
- 1991
(Show Context)
Citation Context |

14 | Computing biconnected components on a hypercube
- Woo, Sahni
- 1991
(Show Context)
Citation Context |

11 |
General branch-and-bound formulation for and/or graph and game tree search
- Kumar, Nau, et al.
- 1988
(Show Context)
Citation Context ... upon the degree of load balance achieved and the overheads due to load balancing. Work created in the execution of many tree search algorithms used in artificial intelligence and operations research =-=[22, 31]-=- and many divide-and-conquer algorithms [16] satisfy all the requirements stated above. As an example, consider the problem of searching a state-space tree in depth-first fashion to find a solution. T... |

11 | Parallel processing of combinatorial search trees - Monien, Vornberger - 1987 |

9 |
Random trees and the analysis of branch and bound proceedures
- Smith
- 1984
(Show Context)
Citation Context ...ms for which the work transfer cost is a function of the amount of work transferred. Instances of such problems are found in tree search applications for domains where strong heuristics are available =-=[38]-=-. For such applications, the search space is polynomial in nature and the size of the stack used to transfer work varies significantly with the amount of work transferred. In this section, we demonstr... |

9 | Simd parallel heuristic search - Mahanti, Daniels - 1992 |

8 |
Floorplan optimization on multiprocessors
- Arvindam, Kumar, et al.
- 1989
(Show Context)
Citation Context ...etwork of workstations provides us with a cheap and universally available platform for parallelizing applications. Several applications have been parallelized to run on a small number of workstations =-=[1, 26]-=-. For example, in [1] an implementation of parallel depth first branch and bound for VLSI floorplan optimization is presented. Linear speedups were obtained on up to 16 processors. The essential part ... |

8 |
Automatic test pattern generation on multiprocessors
- Arvindam, Kumar, et al.
- 1991
(Show Context)
Citation Context |

8 |
Simulated performance of a reduction based multiprocessor
- Keller, Lin
- 1984
(Show Context)
Citation Context ...r work from processors that have work to processors that are idle. Since none of the processors (that have work) know how much work they have, load balancing schemes which require this knowledge (eg. =-=[17, 19]-=-) are not applicable. The performance of a load balancing scheme is dependent upon the degree of load balance achieved and the overheads due to load balancing. Work created in the execution of many tr... |

6 |
Exhaustive search of unstructured trees on the connection machine
- Frye, Myczkowski
- 1990
(Show Context)
Citation Context ...ic computations involving the solution of partial differential equations. Dynamic load Balancing algorithms for SIMD processors are of a very different nature compared to those for MIMD architectures =-=[9, 27, 32, 18]-=-. Due to architectural constraints in SIMD machines, load balancing needs to be done on a global scale. In contrast, on MIMD machines, load can be balanced among a small subset of processors while the... |

6 |
Ichiyoshi Nobuyuki. Probabilistic analysis of the eciency of the dynamic load distribution
- Kimura
- 1991
(Show Context)
Citation Context ...e shown that the overall scalability is still \Omega (P 2) for these architectures. In general, subtask sizes (z) can be of widely differing sizes. Kimura and Ichiyoshi present a detailed analysis in =-=[20]-=- for the case in which subtasks can be of random sizes. They show that in this case, the isoefficiency of SL is given by \Theta (P 2 log P ). 6.2 Multi Level Load Balancing (ML) This scheme tries to c... |

4 |
Ida* on the connection machine
- Powley, Korf, et al.
- 1992
(Show Context)
Citation Context ...ic computations involving the solution of partial differential equations. Dynamic load Balancing algorithms for SIMD processors are of a very different nature compared to those for MIMD architectures =-=[9, 27, 32, 18]-=-. Due to architectural constraints in SIMD machines, load balancing needs to be done on a global scale. In contrast, on MIMD machines, load can be balanced among a small subset of processors while the... |

4 |
Analysis of heuristic search algorithms
- Vempaty, Kumar, et al.
- 1991
(Show Context)
Citation Context ...of workstations). For practical problems, in depth first search, it is much cheaper to incrementally build the state associated with each node rather than copy and/or create the new node from scratch =-=[39, 4]-=-. This also introduces additional inefficiency. Further, the memory requirement at a processor is potentially unbounded, as a processor may be required to store an arbitrarily large number of work pie... |

2 |
Reevaluating Amdahl's Law. Communications of the ACM
- Gustafson
- 1988
(Show Context)
Citation Context ...ntal results are invalidated by changes in any one of the above parameters. Scalability analysis of a parallel algorithm and architecture combination is very useful in extrapolating these conclusions =-=[14, 15, 21, 23]-=-. It may be used to select the best architecture algorithm combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predi... |

1 |
Simd parallel heuristic search. To appear in Artificial Intelligence
- Mahanti, Daniels
- 1992
(Show Context)
Citation Context |