Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract

Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 03600300/99/12000406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
Hypertool: A Programming Aid for MessagePassing Systems
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1990
"... As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepass ..."
Abstract

As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepassing systems called Hypertool. It performs scheduling and handles the communication primitive insertion automatically. Two algorithms, based on the criticalpath method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
The Quadratic Assignment Problem: A Survey and Recent Developments
 In Proceedings of the DIMACS Workshop on Quadratic Assignment Problems, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1994
"... . Quadratic Assignment Problems model many applications in diverse areas such as operations research, parallel and distributed computing, and combinatorial data analysis. In this paper we survey some of the most important techniques, applications, and methods regarding the quadratic assignment probl ..."
Abstract

. Quadratic Assignment Problems model many applications in diverse areas such as operations research, parallel and distributed computing, and combinatorial data analysis. In this paper we survey some of the most important techniques, applications, and methods regarding the quadratic assignment problem. We focus our attention on recent developments. 1. Introduction Given a set N = f1; 2; : : : ; ng and n \Theta n matrices F = (f ij ) and D = (d kl ), the quadratic assignment problem (QAP) can be stated as follows: min p2\Pi N n X i=1 n X j=1 f ij d p(i)p(j) + n X i=1 c ip(i) ; where \Pi N is the set of all permutations of N . One of the major applications of the QAP is in location theory where the matrix F = (f ij ) is the flow matrix, i.e. f ij is the flow of materials from facility i to facility j, and D = (d kl ) is the distance matrix, i.e. d kl represents the distance from location k to location l [62, 67, 137]. The cost of simultaneously assigning facility i to locat...
VLSI cell placement techniques
 ACM Computing Surveys
, 1991
"... VLSI cell placement problem is known to be NP complete. A wide repertoire of heuristic algorithms exists in the literature for efficiently arranging the logic cells on a VLSI chip. The objective of this paper is to present a comprehensive survey of the various cell placement techniques, with emphasi ..."
Abstract

VLSI cell placement problem is known to be NP complete. A wide repertoire of heuristic algorithms exists in the literature for efficiently arranging the logic cells on a VLSI chip. The objective of this paper is to present a comprehensive survey of the various cell placement techniques, with emphasis on standard ce11and macro
Combining Simulated Annealing with Local Search Heuristics
, 1993
"... We introduce a metaheuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search tech ..."
Abstract

We introduce a metaheuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search techniques into simulated annealing so that the chain explores only local optima. It makes large, global changes, even at low temperatures, thus overcoming large barriers in configuration space. We have tested this metaheuristic for the traveling salesman and graph partitioning problems. Tests on instances from public libraries and random ensembles quantify the power of the method. Our algorithm is able to solve large instances to optimality, improving upon state of the art local search methods very significantly. For the traveling salesman problem with randomly distributed cities in a square, the procedure improves on 3opt by 1.6%, and on LinKernighan local search by 1.3%. For the partitioni...
A Genetic Approach to the Quadratic Assignment Problem
, 1992
"... The Quadratic Assignment Problem (QAP) is a wellknown combinatorial optimization problem with a wide variety of practical applications. Although many heuristics and semienumerative procedures for QAP have been proposed, no dominant algorithm has emerged. In this paper, we describe a Genetic Algori ..."
Abstract

The Quadratic Assignment Problem (QAP) is a wellknown combinatorial optimization problem with a wide variety of practical applications. Although many heuristics and semienumerative procedures for QAP have been proposed, no dominant algorithm has emerged. In this paper, we describe a Genetic Algorithm (GA) approach to QAP. Genetic algorithms are a class of randomized parallel search heuristics which emulate biological natural selection on a population of feasible solutions. We present computational results which show that this GA approach finds solutions competitive with those of the best previouslyknown heuristics, and argue that genetic algorithms provide a particularly robust method for QAP and its more complex extensions.
A Parallel Bottomup Clustering Algorithm with Applications to Circuit Partitioning in VLSI Design
 In Proc. ACM/IEEE Design Automation Conference
, 1993
"... In this paper, we present a bottomup clustering algorithm based on recursive collapsing of small cliques in a graph. The sizes of the small cliques are derived using random graph theory. This clustering algorithm leads to a natural parallel implementation in which multiple processors are used to id ..."
Abstract

In this paper, we present a bottomup clustering algorithm based on recursive collapsing of small cliques in a graph. The sizes of the small cliques are derived using random graph theory. This clustering algorithm leads to a natural parallel implementation in which multiple processors are used to identify clusters simultaneously. We also present a clusterbased partitioning method in which our clustering algorithm is used as a preprocessing step to both the bisection algorithm by Fiduccia and Mattheyses and a ratiocut algorithm by Wei and Cheng. Our results show that clusterbased partitioning obtains cut sizes up to 49.6% smaller than the bisection algorithm, and obtains ratio cut sizes up to 66.8% smaller than the ratiocut algorithm. Moreover, we show that clusterbased partitioning produces much stabler results than direct partitioning.
Partitioning of Unstructured Meshes for Load Balancing
, 1995
"... Many largescale engineering and scientific calculations involve repeated updating of variables on an unstructured mesh. To do these types of computations on distributed memory parallel computers, it is necessary to partition the mesh among the processors so that the load balance is maximized and in ..."
Abstract

Many largescale engineering and scientific calculations involve repeated updating of variables on an unstructured mesh. To do these types of computations on distributed memory parallel computers, it is necessary to partition the mesh among the processors so that the load balance is maximized and interprocessor communication time is minimized. This can be approximated by the problem of partitioning a graph so as to obtain a minimum cut, a wellstudied combinatorial optimization problem. Graph partitioning is NP complete, so for real world applications, one resorts to heuristics, i.e., algorithms that give good but not necessarily optimum solutions. These algorithms include recursive spectral bisection, local search methods such as KernighanLin, and more general purpose methods such as simulated annealing. We show that a general procedure enables us to combine simulating annealing with KernighanLin. The resulting algorithm is both very fast and extremely effective. 1 Introduction Co...
CompileTime Scheduling of Dataflow Program graphs with Dynamic Constructs
 University of California, Berkeley
, 1992
A Parallel Algorithm for CompileTime Scheduling of Parallel Programs on Multiprocessors
 PACT'97
, 1997
"... In this paper, we propose a parallel randomized algorithm, called Parallel Fast Assignment using Search Technique (PFAST), for scheduling parallel programs represented by directed acyclic graphs (DAGs) during compiletime. The PFAST algorithm has time complexity where e is the number of edges in th ..."
Abstract

In this paper, we propose a parallel randomized algorithm, called Parallel Fast Assignment using Search Technique (PFAST), for scheduling parallel programs represented by directed acyclic graphs (DAGs) during compiletime. The PFAST algorithm has time complexity where e is the number of edges in the DAG. This lineartime algorithm works by first generating an initial solution and then refining it using a parallel random search. Using a prototype computeraided parallelization and scheduling tool called CASCH, the algorithm is found to outperform numerous previous algorithms while taking dramatically smaller execution times. The distinctive feature of this research is that, instead of simulations, our proposed algorithm is evaluated and compared with other algorithms using the CASCH tool with real applications running on the Intel Paragon. The PFAST algorithm is also evaluated with randomly generated DAGs for which optimal schedules are known. The algorithm generated optimal solutions for a majority of the test cases and closetooptimal solutions for the others. The proposed algorithm is the fastest scheduling algorithm known to us and is an attractive choice for scheduling under running time constraints.