Results 1  10
of
34
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract

Cited by 120 (16 self)
 Add to MetaCart
(Show Context)
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
APHID GameTree Search
 Journal of Parallel and Distributed Computing
, 1997
"... This paper introduces the APHID (Asynchronous Parallel Hierarchical Iterative Deepening) gametree search algorithm. An APHID search is controlled by a master and a series of slave processors. The master searches the first d 0 ply of the gametree repeatedly. The slaves are responsible for the bott ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
This paper introduces the APHID (Asynchronous Parallel Hierarchical Iterative Deepening) gametree search algorithm. An APHID search is controlled by a master and a series of slave processors. The master searches the first d 0 ply of the gametree repeatedly. The slaves are responsible for the bottom plies of the gametree. The slaves asynchronously read work lists from the master and return score information to the master. The master uses the returned score information to generate approximate minimax values, until all of the required score information is available. APHID has been programmed as an easy to implement, gameindependent fffi library, and was implemented into a chess program with one day of programming effort. APHID yields reasonable performance on a network of workstations, an architecture where it is extremely difficult to use a shared transposition table efffectively. 1 Introduction The alphabeta (fffi) minimax tree search algorithm has proven to be a difficult algori...
Unstructured Tree Search on SIMD Parallel Computers
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mech ..."
Abstract

Cited by 42 (16 self)
 Add to MetaCart
(Show Context)
In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical...
Studying Overheads in Massively Parallel Min/MaxTree Evaluation (Extended Abstract)
 In ACM Symposium on Parallel Architectures and Algorithms
, 1994
"... ) y Rainer Feldmann and Peter Mysliwietz and Burkhard Monien Email: chess@unipaderborn.de Department of Mathematics and Computer Science, University of Paderborn, Germany Abstract In this paper we study the overheads arising in our algorithm for distributed evaluation of Min/Max trees. The overhe ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
(Show Context)
) y Rainer Feldmann and Peter Mysliwietz and Burkhard Monien Email: chess@unipaderborn.de Department of Mathematics and Computer Science, University of Paderborn, Germany Abstract In this paper we study the overheads arising in our algorithm for distributed evaluation of Min/Max trees. The overheads are classified into search overhead, performance loss, and decrease of work load. Several mechanisms are investigated to cope with these overheads in order to achieve a high performance. We study a combination of local, medium range, and global load distribution strategies that does not only show a good behavior in terms of work load, but also has a positive influence on the search overhead. The efficient use of a virtual shared memory, that is distributed among the processors, shows also a big contribution to the overall performance of the system. A carefully restricted application of parallelism using an improved version of the Young Brothers Wait Concept (YBWC) leads to a perfect beha...
Communication Complexity for Parallel DivideandConquer
 In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science
, 1991
"... This paper studies the relationship between parallel computation cost and communication cost for performing divideandconquer (D&C) computations on a parallel system of p processors. The parallel computation cost is the maximal number of the D&C nodes that any processor in the parallel syst ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
This paper studies the relationship between parallel computation cost and communication cost for performing divideandconquer (D&C) computations on a parallel system of p processors. The parallel computation cost is the maximal number of the D&C nodes that any processor in the parallel system may expand, whereas the communication cost is the total number of cross nodes. A cross node is a node which is generated by one processor but expanded by another processor. A new scheduling algorithm is proposed, whose parallel computation cost and communication cost are at most dN=pe and pdh, respectively, for any D&C computation tree with N nodes, height h, and degree d. Also, lower bounds on the communication cost are derived. In particular, it is shown that for each scheduling algorithm and for each positive ffl C ! 1, which can be arbitrarily close to 0, there are values of N , h, d, p, and ffl T (? 0), for which if the parallel computation cost is between N=p (the minimum) and (1 + ffl T ...
APHID: Asynchronous Parallel GameTree Search
, 1999
"... Most parallel gametree search approaches use synchronous methods, where the work is concentrated within a specific part of the tree, or at a given search depth. This article shows that asynchronous gametree search algorithms can be as efficient as or better than synchronous methods in determini ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
(Show Context)
Most parallel gametree search approaches use synchronous methods, where the work is concentrated within a specific part of the tree, or at a given search depth. This article shows that asynchronous gametree search algorithms can be as efficient as or better than synchronous methods in determining the minimax value. APHID, a new asynchronous parallel gametree search algorithm, is presented. APHID is implemented as a freelyavailable portable library, making the algorithm easy to integrate into a sequential gametree searching program. APHID has been added to four programs written by different authors. APHID yields better speedups than synchronous search methods for an Othello and a checkers program, and comparable speedups on two chess programs.
BestFirst Heuristic Search for MultiCore Machines
"... eaburns, seth.lemons, ruml at cs.unh.edu rzhou at parc.com To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we present a general approach to bestfirst heuristic search in a sharedmemory setting. Each thread attempts to ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
eaburns, seth.lemons, ruml at cs.unh.edu rzhou at parc.com To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we present a general approach to bestfirst heuristic search in a sharedmemory setting. Each thread attempts to expand the most promising open nodes. By using abstraction to partition the state space, we detect duplicate states without requiring frequent locking. We allow speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, verifying its correctness using temporal logic. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using an 8core machine, we show that A * implemented in our framework yields faster search than improved versions of previous parallel search proposals. Our approach extends easily to other bestfirst searches, such as Anytime weighted A*. 1
Parallel Processing of Discrete Optimization Problems
 IN ENCYCLOPEDIA OF MICROCOMPUTERS
, 1993
"... Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goa ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods such as branchandbound and dynamic programming. Availability of parallel computers has created substantial interest in exploring the use of parallel processing for solving discrete optimization problems. This article provides an overview of parallel search algorithms for solving discrete optimization problems.
A Fully Distributed Chess Program
, 1991
"... We show how to implement the fffienhancements like iterative deepening, transposition tables, history tables etc. used in sequential chess programs in a distributed system such that the distributed algorithm profits by these heuristics as well as the sequential does. Moreover the methods we describ ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
We show how to implement the fffienhancements like iterative deepening, transposition tables, history tables etc. used in sequential chess programs in a distributed system such that the distributed algorithm profits by these heuristics as well as the sequential does. Moreover the methods we describe are suitable for very large distributed systems. We implemented these fffienhancements in the distributed chess program ZUGZWANG. For a distributed system of 64 processors we obtain a speedup between 28 and 34 running at tournament speed. The basis for this chess program is a distributed fffialgorithm with very good load balancing properties combined with the use of a distributed transposition table that grows with the size of the distributed system. 1. INTRODUCTION In this paper we describe a fully distributed chess program ZUGZWANG running on a network of Transputers. We present experimental results that show the efficiency of our implementation. The good behavior of the sequential f...
A Taxonomy Of Parallel GameTree Search Algorithms
, 1996
"... this paper. The taxonomy will be broken into two major categories: fffibased algorithms, and algorithms based on other search paradigms (SSS , ER, and theoretical methods). For the former category, a table is given to isolate the fundamental differences between the algorithms. The table is divided ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
this paper. The taxonomy will be broken into two major categories: fffibased algorithms, and algorithms based on other search paradigms (SSS , ER, and theoretical methods). For the former category, a table is given to isolate the fundamental differences between the algorithms. The table is divided into two parts: the first part contains characteristics of the fffibased algorithms, while the second part contains details about an implementation of each algorithm. Section 2 describes the various columns given in the table, and then gives some brief details on the algorithms contained therein. The algorithms based on other search paradigms are given in Section 3. Due to the varied nature of the methods, a brief description is given for each of the algorithms and no attempt has been made to categorize them to the same extent as the fffibased algorithms. The implementation details have not been organized into a table, since some of the algorithms given are of a theoretical nature and have not been implemented or simulated. The final section deals with some conclusions that can be drawn from the taxonomy. 2 fffiBASED PARALLEL GAMETREE SEARCH