Results 1  10
of
31
A Tabu Search Approach to Task Scheduling on Heterogeneous Processors under Precedence Constraints
, 1994
"... Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application d ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application defined by a task precedence graph, the goal of task scheduling (or processor assignment) is thus the minimization of the makespan of the application. In a heterogeneous multiprocessor system, task scheduling consists in determining which tasks will be assigned to each processor, as well as the execution order of the tasks assigned to each processor. In this work, we apply the tabu search metaheuristic to the solution of the task scheduling problem on a heterogeneous multiprocessor environment under precedence constraints. The topology of the Mean Value Analysis solution package for product form queueing networks is used as the framework for performance evaluation. We show that tabu search ob...
A New Mapping Heuristic Based On Mean Field Annealing
, 1992
"... A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in co ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in comparison with two wellknown heuristics; Simulated Annealing and KernighanLin. Results of the experiments indicate that MFA can be used as an alternative heuristic for solving the mapping problem. Inherent parallelism of MFA is exploited by designing an efficient parallel algorithm for the proposed MFA heuristic. 1 Introduction Today, with the aid of VLSI technology, parallel computers not only exist in research laboratories, but are also available on the market as powerful, general purpose computers. Wide use of parallel computers in various compute intensive applications makes the problem of mapping parallel programs to parallel computers more crucial. The mapping problem arises while d...
Decomposing irregularly sparse matrices for parallel matrixvector multiplications
 LECTURE NOTES IN COMPUTER SCIENCE
, 1996
"... In this work, we show the de ciencies of the graph model for decomposing sparse matrices for parallel matrixvector multiplication. Then, we propose two hypergraph models which avoid all de ciencies of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergrap ..."
Abstract

Cited by 28 (14 self)
 Add to MetaCart
In this work, we show the de ciencies of the graph model for decomposing sparse matrices for parallel matrixvector multiplication. Then, we propose two hypergraph models which avoid all de ciencies of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergraph partitioning problem widely encountered in circuit partitioning in VLSI. We have implemented fast KernighanLin based graph and hypergraph partitioning heuristics and used the successful multilevel graph partitioning tool (Metis) for the experimental evaluation of the validity of the proposed hypergraph models. We have also developed a multilevel hypergraph partitioning heuristic for experimenting the performance of the multilevel approach on hypergraph partitioning. Experimental results on sparse matrices, selected from HarwellBoeing collection and NETLIB suite, con rm both the validity of our proposed hypergraph models and appropriateness of the multilevel approach to hypergraph partitioning.
Task Assignment and Transaction Clustering Heuristics for Distributed Systems
 Information Sciences
, 1997
"... In this paper we present discuss the task assignment problem for distributed systems. We also show how this problem is very similar to that of clustering transactions for load balancing purposes and for their efficient execution in a distributed environment. The formalization of these problems in te ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
In this paper we present discuss the task assignment problem for distributed systems. We also show how this problem is very similar to that of clustering transactions for load balancing purposes and for their efficient execution in a distributed environment. The formalization of these problems in terms of a graph theoretic representation of a distributed program, or of a set of related transactions, is given. The cost function which needs to be minimized by an assignment of tasks to processors or of transactions to clusters is detailed, and we survey related work, as well work on the dynamic load balancing problem. Since the task assignment problem is NPhard, we present three novel heuristic algorithms that we have tested for solving it and compare them to the wellknown greedy heuristic. These novel heuristics use neural networks, genetic algorithms and simulated annealing. Both the resulting performance and the computational cost for these algorithms is evaluated on a large number o...
Scotch 3.1 User's Guide
, 1996
"... The efficient execution of a parallel program on a parallel machine requires good placement of the communicating processes of the program onto the processors of the machine. When both the program and the machine are modeled in terms of weighted unoriented graphs, this problem amounts to static graph ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
The efficient execution of a parallel program on a parallel machine requires good placement of the communicating processes of the program onto the processors of the machine. When both the program and the machine are modeled in terms of weighted unoriented graphs, this problem amounts to static graph mapping. This document describes the capabilities and operations of Scotch, a software package devoted to graph mapping, based on the Dual Recursive Bipartitioning algorithm. Predefined mapping strategies allow for recursive application of any of several graph bipartitioning methods, including FiducciaMattheyses, GibbsPooleStockmeyer, and multilevel methods. Scotch can map any weighted process graph onto any weighted target graph, whether they are connected or not. We give brief descriptions of the algorithm and bipartitioning methods, detail the input/output formats, instructions for use, and installation procedures, and provide a number of examples.
Transient performance of
 EPRCA and EPRCA++,” ATM Forum Contribution 941173, November–December
, 1994
"... Predicting application performance using supervised ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Predicting application performance using supervised
Experimental Analysis of the Dual Recursive Bipartitioning Algorithm for Static Mapping
 TR 103896, LaBRI, URA CNRS 1304, Univ. Bordeaux I
, 1996
"... The combinatorial optimization problem of assigning the coexisting communicating processes of a parallel program onto a parallel machine so as to minimize its overall execution time is called static mapping. In this paper, we present a mapping algorithm based on the recursive bipartitioning of both ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
The combinatorial optimization problem of assigning the coexisting communicating processes of a parallel program onto a parallel machine so as to minimize its overall execution time is called static mapping. In this paper, we present a mapping algorithm based on the recursive bipartitioning of both the source process graph and the target architecture graph, whose divide and conquer and modular approach allows the handling of many topologies and bipartitioning methods. Specific experimental studies are carried out in order to validate the algorithm, and determine the conditions under which it achieves maximum efficiency. We analyze the interactions between the order in which the recursive bipartitionings are performed and the structure of the graphs to map; we evaluate the features of our implementation of the FiducciaMattheyses algorithm for graph partitioning that allow it to handle weighted graphs; and we evidence the influence of the decomposition of the target topology on mapping ...
Task Assignment on DistributedMemory Systems with Adaptive Wormhole Routing
 Proc. Interact 2001
, 1994
"... Assignment of tasks of a parallel program onto processors of a distributedmemory system is critical to obtain minimal program completion time by minimizing communication overhead. Wormholerouting switching technique, with various adaptive routing strategies, is increasingly becoming the trend to b ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Assignment of tasks of a parallel program onto processors of a distributedmemory system is critical to obtain minimal program completion time by minimizing communication overhead. Wormholerouting switching technique, with various adaptive routing strategies, is increasingly becoming the trend to build scalable distributedmemory systems. This paper presents task assignment heuristics for such wormholerouted systems and analyzes the effect of adaptive routing. A Temporal Communication Graph (TCG) is used to model task graphs and to identify communication steps that conflict both temporally and spatially. Heuristics are proposed to capture temporal link contention and derive optimal assignment in an iterative manner by pairwise exchanging of processors, associated with the critical communication edges, within d hops. The interplay between degree of routing adaptivity, topology, application characteristics, and optimal task assignment are studied through simulation experiments using ran...
Clustering and IntraProcessor Scheduling for ExplicitlyParallel Programs on DistributedMemory Systems
 In International Parallel Processing Symposium
, 1994
"... Programs for distributedmemory systems are explicitlyparallel and comprise of a set of sequential tasks or processes that communicate via messagepassing. The sequence of computation in each task together with the intermediate send and receive communication steps exhibit temporal behavior of the p ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Programs for distributedmemory systems are explicitlyparallel and comprise of a set of sequential tasks or processes that communicate via messagepassing. The sequence of computation in each task together with the intermediate send and receive communication steps exhibit temporal behavior of the program. In this paper, we show that the two common models of program representation, the precedence graph and the interaction graph models, are insufficient to capture this temporal behavior and hence are not ideal for solving the clustering and the scheduling problems. We use a new Temporal Communication Graph (TCG) model to represent such explicitlyparallel programs. This model captures communication dependency and overlap of communication with computation. This provides flexibility to get a better estimate of the program completion time. New measures are developed for quantifying critical communication and intertask parallelism on this model. We analyze the importance of intraprocessor...