Results 1  10
of
16
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract

Cited by 101 (16 self)
 Add to MetaCart
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
Unstructured Tree Search on SIMD Parallel Computers
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mech ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical...
Chare Kernel  A Runtime Support System For Parallel Computations
, 1991
"... This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare kernel language can be executed on different parallel machines without change. Users writing such programs concern themselves with creation of parallel actions but not with assigning them to specific processors. We describe the design and implementation of the chare kernel. Performance of chare kernel programs on two hypercube machines, the Intel iPSC/2 and the NCUBE, is also given. 1. Introduction Large parallel computer systems are becoming increasingly available, and larger systems will be built in the near future [27]. For example, the NCUBE/2 with 8K processors has been commercially announced, and a 16K processor MIMD machine is being built [6, 12]. However, programming these machin...
Communication Complexity for Parallel DivideandConquer
 In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science
, 1991
"... This paper studies the relationship between parallel computation cost and communication cost for performing divideandconquer (D&C) computations on a parallel system of p processors. The parallel computation cost is the maximal number of the D&C nodes that any processor in the parallel system may e ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
This paper studies the relationship between parallel computation cost and communication cost for performing divideandconquer (D&C) computations on a parallel system of p processors. The parallel computation cost is the maximal number of the D&C nodes that any processor in the parallel system may expand, whereas the communication cost is the total number of cross nodes. A cross node is a node which is generated by one processor but expanded by another processor. A new scheduling algorithm is proposed, whose parallel computation cost and communication cost are at most dN=pe and pdh, respectively, for any D&C computation tree with N nodes, height h, and degree d. Also, lower bounds on the communication cost are derived. In particular, it is shown that for each scheduling algorithm and for each positive ffl C ! 1, which can be arbitrarily close to 0, there are values of N , h, d, p, and ffl T (? 0), for which if the parallel computation cost is between N=p (the minimum) and (1 + ffl T ...
Iterative Dynamic Load Balancing in Multicomputers
 Journal of Operational Research Society
, 1994
"... Dynamic load balancing in multicomputers can improve the utilization of processors and the efficiency of parallel computations through migrating workload across processors at runtime. We present a survey and critique of dynamic load balancing strategies that are iterative: workload migration is car ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Dynamic load balancing in multicomputers can improve the utilization of processors and the efficiency of parallel computations through migrating workload across processors at runtime. We present a survey and critique of dynamic load balancing strategies that are iterative: workload migration is carried out through transferring processes across nearest neighbor processors. Iterative strategies have become prominent in recent years because of the increasing popularity of pointtopoint interconnection networks for multicomputers. Key words: dynamic load balancing, multicomputers, optimization, queueing theory, scheduling. INTRODUCTION Multicomputers are highly concurrent systems that are composed of many autonomous processors connected by a communication network 1;2 . To improve the utilization of the processors, parallel computations in multicomputers require that processes be distributed to processors in such a way that the computational load is evenly spread among the processors...
Parallel Processing of Discrete Optimization Problems
 IN ENCYCLOPEDIA OF MICROCOMPUTERS
, 1993
"... Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goa ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods such as branchandbound and dynamic programming. Availability of parallel computers has created substantial interest in exploring the use of parallel processing for solving discrete optimization problems. This article provides an overview of parallel search algorithms for solving discrete optimization problems.
A General Architecture for Load Balancing in a DistributedMemory Environment
 13th IEEE Int. Conf. on Distributed Computing
, 1993
"... The goal of load balancing is to assign to each node a number of tasks proportional to its performance. On distributedmemory machines, it is important to take data dependencies into account when distributing tasks, since they have a big impact on the communication requirements of the distributed ap ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The goal of load balancing is to assign to each node a number of tasks proportional to its performance. On distributedmemory machines, it is important to take data dependencies into account when distributing tasks, since they have a big impact on the communication requirements of the distributed application. Many load balancers have been proposed that deal with applications with homogeneous tasks, but applications with heterogeneous tasks have proven to be far more complex to handle. In this paper we present a load balancing architecture that can deal with applications with heterogeneous tasks. The idea is to provide a set of load balancers that are effective for different types of homogeneous tasks, and to allow users to combine these load balancers for applications with heterogeneous tasks. We implemented this architecture on the Nectar multicomputer and we present performance results for several applications with homogeneous and heterogeneous tasks. Keywords: load balancing, hetero...
Random Seeking: A General, Efficient, and Informed Randomized Scheme for Dynamic Load Balancing
 IN PROC. OF THE 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM (IPPS
, 1996
"... We propose a completely general, informed randomized dynamic load balancing method called random seeking (RS) suitable for parallel algorithms with characteristics found in many search algorithms used in artificial intelligence and operations research and many divideandconquer algorithms. In it, s ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We propose a completely general, informed randomized dynamic load balancing method called random seeking (RS) suitable for parallel algorithms with characteristics found in many search algorithms used in artificial intelligence and operations research and many divideandconquer algorithms. In it, source processors randomly seek out sink processors for load balancing by flinging "probe" messages. These probes not only locate sinks, but also collect load distribution information which is used to efficiently regulate load balancing activities. We empirically compare RS with a wellknown randomized dynamic load balancing method, the random communication (RC) strategy, by using them in parallel bestfirst branchandbound algorithms on up to 512 processors of an nCUBE2 multicomputer. We find that the RC execution times are more than those of RS by 867% when used to perform combined dynamic quantitative and qualitative load balancing, and by 574% when used to perform just dynamic quant...
Efficient Parallel DivideandConquer for a Class of Interconnection Topologies.
 In Proceedings of the 2nd International Symposium on Algorithms, number 557 in Lecture Notes in Computer Science
, 1991
"... : In this paper, we propose an efficient scheduling algorithm for expanding any divideandconquer (D&C) computation tree on kdimensional mesh, hypercube, and perfect shuffle networks with p processors. Assume that it takes t n time steps to expand one node of the tree and t c time steps to transmit ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
: In this paper, we propose an efficient scheduling algorithm for expanding any divideandconquer (D&C) computation tree on kdimensional mesh, hypercube, and perfect shuffle networks with p processors. Assume that it takes t n time steps to expand one node of the tree and t c time steps to transmit one datum or convey one node. For any D&C computation tree with N nodes, height h, and degree d (maximal number of children of any node), our algorithm requires at most (N=p + h)t n + 'dht c time steps, where ' is O(log 2 p) on a hypercube or perfect shuffle network and is O( k p p) on a n k\Gamma1 \Theta \Delta \Delta \Delta \Theta n 0 mesh network, where n k\Gamma1 = \Delta \Delta \Delta = n 0 = k p p. This algorithm is general in the sense that it does not know the values of N , h, and d, and the shape of the computation tree as well, a priori. Most importantly, we can easily obtain a linear speedup by nearly a factor of p, especially when N AE ph(1 + 'dt c =t n ). 1. Introduction ...
A Survey of Parallel Search Algorithms for Discrete Optimization Problems
 ORSA JOURNAL ON COMPUTING
, 1993
"... Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal n ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods. Availability of parallel computers has created substantial interest in exploring parallel formulations of these graph and tree search methods. This article provides a survey of various parallel search algorithms such as Backtracking, IDA*, A*, BranchandBound techniques and Dynamic Programming. It addresses issues related to load balancing, communication costs, scalability and the phenomenon of speedup anomalies in parallel search.