Results 1  10
of
60
A Tabu Search Approach to Task Scheduling on Heterogeneous Processors under Precedence Constraints
, 1994
"... Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application d ..."
Abstract

Cited by 43 (9 self)
 Add to MetaCart
Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application defined by a task precedence graph, the goal of task scheduling (or processor assignment) is thus the minimization of the makespan of the application. In a heterogeneous multiprocessor system, task scheduling consists in determining which tasks will be assigned to each processor, as well as the execution order of the tasks assigned to each processor. In this work, we apply the tabu search metaheuristic to the solution of the task scheduling problem on a heterogeneous multiprocessor environment under precedence constraints. The topology of the Mean Value Analysis solution package for product form queueing networks is used as the framework for performance evaluation. We show that tabu search ob...
Task Assignment and Transaction Clustering Heuristics for Distributed Systems
 Information Sciences
, 1997
"... In this paper we present discuss the task assignment problem for distributed systems. We also show how this problem is very similar to that of clustering transactions for load balancing purposes and for their efficient execution in a distributed environment. The formalization of these problems in te ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
(Show Context)
In this paper we present discuss the task assignment problem for distributed systems. We also show how this problem is very similar to that of clustering transactions for load balancing purposes and for their efficient execution in a distributed environment. The formalization of these problems in terms of a graph theoretic representation of a distributed program, or of a set of related transactions, is given. The cost function which needs to be minimized by an assignment of tasks to processors or of transactions to clusters is detailed, and we survey related work, as well work on the dynamic load balancing problem. Since the task assignment problem is NPhard, we present three novel heuristic algorithms that we have tested for solving it and compare them to the wellknown greedy heuristic. These novel heuristics use neural networks, genetic algorithms and simulated annealing. Both the resulting performance and the computational cost for these algorithms is evaluated on a large number o...
A New Mapping Heuristic Based On Mean Field Annealing
, 1992
"... A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in co ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in comparison with two wellknown heuristics; Simulated Annealing and KernighanLin. Results of the experiments indicate that MFA can be used as an alternative heuristic for solving the mapping problem. Inherent parallelism of MFA is exploited by designing an efficient parallel algorithm for the proposed MFA heuristic. 1 Introduction Today, with the aid of VLSI technology, parallel computers not only exist in research laboratories, but are also available on the market as powerful, general purpose computers. Wide use of parallel computers in various compute intensive applications makes the problem of mapping parallel programs to parallel computers more crucial. The mapping problem arises while d...
Decomposing irregularly sparse matrices for parallel matrixvector multiplications
 LECTURE NOTES IN COMPUTER SCIENCE
, 1996
"... In this work, we show the de ciencies of the graph model for decomposing sparse matrices for parallel matrixvector multiplication. Then, we propose two hypergraph models which avoid all de ciencies of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergrap ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
(Show Context)
In this work, we show the de ciencies of the graph model for decomposing sparse matrices for parallel matrixvector multiplication. Then, we propose two hypergraph models which avoid all de ciencies of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergraph partitioning problem widely encountered in circuit partitioning in VLSI. We have implemented fast KernighanLin based graph and hypergraph partitioning heuristics and used the successful multilevel graph partitioning tool (Metis) for the experimental evaluation of the validity of the proposed hypergraph models. We have also developed a multilevel hypergraph partitioning heuristic for experimenting the performance of the multilevel approach on hypergraph partitioning. Experimental results on sparse matrices, selected from HarwellBoeing collection and NETLIB suite, con rm both the validity of our proposed hypergraph models and appropriateness of the multilevel approach to hypergraph partitioning.
Topology mapping for Blue Gene/L supercomputer
 In: SC
, 2006
"... Mapping virtual processes onto physical processos is one of the most important issues in parallel computing. The problem of mapping of processes/tasks onto processors is equivalent to the graph embedding problem which has been studied extensively. Although many techniques have been proposed for em ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Mapping virtual processes onto physical processos is one of the most important issues in parallel computing. The problem of mapping of processes/tasks onto processors is equivalent to the graph embedding problem which has been studied extensively. Although many techniques have been proposed for embeddings of twodimensional grids, hypercubes, etc., there are few efforts on embeddings of threedimensional grids and tori. Motivated for better support of task mapping for Blue Gene/L supercomputer, in this paper, we present embedding and integration techniques for the embeddings of threedimensional grids and tori. The topology mapping library that based on such techniques generates highquality embeddings of two/threedimensional grids/tori. In addition, the library is used in BG/L MPI library for scalable support of MPI topology functions. With extensive empirical studies on large scale systems against popular benchmarks and real applications, we demonstrate that the library can significantly improve the communication performance and the scalability of applications. 1
Topologyaware task mapping for reducing communication contention on large parallel machines
, 2006
"... ..."
(Show Context)
2011b. Avoiding hotspots on twolevel direct networks
 In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11). ACM
"... A lowdiameter, fast interconnection network is going to be a prerequisite for building exascale machines. A twolevel direct network has been proposed by several groups as a scalable design for future machines. IBM’s PERCS topology and the dragonfly network discussed in the DARPA exascale hardwa ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
A lowdiameter, fast interconnection network is going to be a prerequisite for building exascale machines. A twolevel direct network has been proposed by several groups as a scalable design for future machines. IBM’s PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hotspots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a twolevel direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the tradeoffs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of twolevel direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.
Scotch 3.1 User's Guide
, 1996
"... The efficient execution of a parallel program on a parallel machine requires good placement of the communicating processes of the program onto the processors of the machine. When both the program and the machine are modeled in terms of weighted unoriented graphs, this problem amounts to static graph ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
The efficient execution of a parallel program on a parallel machine requires good placement of the communicating processes of the program onto the processors of the machine. When both the program and the machine are modeled in terms of weighted unoriented graphs, this problem amounts to static graph mapping. This document describes the capabilities and operations of Scotch, a software package devoted to graph mapping, based on the Dual Recursive Bipartitioning algorithm. Predefined mapping strategies allow for recursive application of any of several graph bipartitioning methods, including FiducciaMattheyses, GibbsPooleStockmeyer, and multilevel methods. Scotch can map any weighted process graph onto any weighted target graph, whether they are connected or not. We give brief descriptions of the algorithm and bipartitioning methods, detail the input/output formats, instructions for use, and installation procedures, and provide a number of examples.
Hypercube Embedding Heuristics: An evaluation
, 1990
"... The hypercube embedding problem, a restricted version of the general mapping problem, is the problem of mapping a set of communicating processes to a hypercube multiprocessor. The goal is to find a mapping that minimizes the length of the paths between communicating processes. Unfortunately the hype ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The hypercube embedding problem, a restricted version of the general mapping problem, is the problem of mapping a set of communicating processes to a hypercube multiprocessor. The goal is to find a mapping that minimizes the length of the paths between communicating processes. Unfortunately the hypercube embedding problem bas been shown to be NPhard. Thus many heuristics have been proposed for hypercube embedding. This paper evaluates several hypercube embedding heuristics, including simulated annealing, local search, greedy, and recursive mincut bipartitioning. In addition to known heuristics, we propose a new greedy heuristic, a new KernighanLin style heuristic, and some new features to enhance local search. We then assess variations of these strategies (e.g., different neighborhood structures) and combinations of them (e.g., greedy as a front end of iterative improvement heuristics). The asymptotic running times of the heuristics are given, based on efficient implementations using a priorityqueue data structure.