## List Scheduling with and without Communication Delays (1993)

Venue: | Parallel Computing |

Citations: | 37 - 6 self |

### BibTeX

@ARTICLE{Yang93listscheduling,

author = {Tao Yang and Apostolos Gerasoulis},

title = {List Scheduling with and without Communication Delays},

journal = {Parallel Computing},

year = {1993},

volume = {19},

pages = {1321--1344}

}

### Years of Citing Articles

### OpenURL

### Abstract

Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a...

### Citations

334 |
Bounds for certain multiprocessing anomalies
- Graham
- 1966
(Show Context)
Citation Context ...ost is zero. Even under this simplification the problem remains NP-complete, but there is an interesting upper bound in performance. Any list scheduling heuristic is within 50% of the optimum, Graham =-=[13]-=-. List scheduling has actually an even better average performance. Adam, Chandy and Dickson [1] demonstrated experimentally that the critical path (CP) list scheduling heuristic is within 5% of the op... |

182 |
A comparison of list scheduling for parallel processing systems
- Adam, Chandy, et al.
- 1974
(Show Context)
Citation Context ...eresting upper bound in performance. Any list scheduling heuristic is within 50% of the optimum, Graham [13]. List scheduling has actually an even better average performance. Adam, Chandy and Dickson =-=[1]-=- demonstrated experimentally that the critical path (CP) list scheduling heuristic is within 5% of the optimum in 90% of the times. In this paper we study the properties of list scheduling that provid... |

172 |
Parallel Sequencing and Assembly Line Problems
- Hu
- 1961
(Show Context)
Citation Context ... (i.e. within 5 percent of the optimal in 90 percent of random cases.) Theoretically, Coffman and Graham [4] show that CP is optimal for scheduling DAGs of equal task sizes on two processors. Also Hu =-=[15]-=- shows that CP is optimal for scheduling tree DAGs of equal task sizes on p processors. Previous research emphasis has been in uncovering the worst case performance in list scheduling and determining ... |

165 |
On the mapping problem
- Bokhari
- 1981
(Show Context)
Citation Context ...soulis and Yang [12]. If the architecture is not completely connected then a physical mapping that takes into account the network topology is needed. Currently PYRROS uses a modification of Bokhari's =-=[3]-=- mapping algorithm. Once the processor assignment is fixed then the communication weights must be re-adjusted to incorporate the processor distance. A common approximation to edge weights is the linea... |

149 |
Introduction to Parallel and Vector Solution of Linear Systems
- Ortega
- 1988
(Show Context)
Citation Context ...ension n, v = n 2 =2 and e = n 2 , implying that an O(v 2 ) algorithm will have O(n 4 ) complexity which is higher than a sequential GE algorithm. PYRROS scheduling algorithms use a multistage method =-=[17, 21, 22, 24]-=- that includes clustering and task ordering. In this paper, we will give a brief discussion of the PYRROS approach and analyze in detail the task ordering problem. The task ordering problem is to dete... |

127 |
A comparison of clustering heuristics for scheduling DAGs on multiprocessors
- Gerasoulis, Yang
- 1992
(Show Context)
Citation Context ...he clustering stage PYRROS uses the DSC algorithm [25]. Here we only describe the DSC algorithm briefly. A detailed experimental and theoretical comparison of DSC with other existing algorithms is in =-=[11]-=-. DSC has been found to outperform several other algorithms. The DSC performs a sequence of clustering refinement steps. Initially all tasks are mapped in separate (a) (b) (c) (d) (e) n 1 n 2 n 4 n 3 ... |

123 |
A general approach to mapping of parallel computation upon multiprocessor architectures
- Kim, Browne
- 1988
(Show Context)
Citation Context ...harder. List scheduling no longer provides the 50% performance guarantee. Only recently considerable attention has been paid to this problem, e.g. Sarkar [22], El-Rewini and Lewis [8], Kim and Browne =-=[17]-=-, Wu and Gajski [23], Cosnard et al. [5], Darte [6] and many others. One of the reasons for the renewed emphasis on scheduling with 3 Supported by Grant No. DMS-8706122 from NSF. communication is that... |

107 |
Scheduling parallel program tasks onto arbitrary target machines
- El-Rewini, Lewis
- 1990
(Show Context)
Citation Context ...ing problem is much harder. List scheduling no longer provides the 50% performance guarantee. Only recently considerable attention has been paid to this problem, e.g. Sarkar [22], El-Rewini and Lewis =-=[8]-=-, Kim and Browne [17], Wu and Gajski [23], Cosnard et al. [5], Darte [6] and many others. One of the reasons for the renewed emphasis on scheduling with 3 Supported by Grant No. DMS-8706122 from NSF. ... |

106 |
Practical multiprocessor scheduling algorithms for efficient parallel processing
- Kasahara, Narita
- 1984
(Show Context)
Citation Context ...the RCP 3 based on L 3 (x) priorities. Both of these heuristics can be implemented in O(v log v +e) time. For tie breaking between tasks we can use the MISF (Most Immediate Successor First) principle =-=[16]-=-. The ffi -lopt definition here is slightly different that one given for list scheduling with zero communication. The local optimum parallel time PT i lopt is derived over all possible schedules of re... |

105 |
Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors
- Sarkar
- 1989
(Show Context)
Citation Context ... communication the scheduling problem is much harder. List scheduling no longer provides the 50% performance guarantee. Only recently considerable attention has been paid to this problem, e.g. Sarkar =-=[22]-=-, El-Rewini and Lewis [8], Kim and Browne [17], Wu and Gajski [23], Cosnard et al. [5], Darte [6] and many others. One of the reasons for the renewed emphasis on scheduling with 3 Supported by Grant N... |

87 | PYRROS: Static Task Scheduling and Code Generation for Message-Passing Multiprocessors
- Yang, Gerasoulis
- 1992
(Show Context)
Citation Context ...EGAMI/LaRCS software tool by Lo et al. [19] is used for the mapping of directed task graphs onto processors. We have also built an automatic static scheduling and code generation system called PYRROS =-=[26]-=- for parallel architectures. PYRROS is targeted at scheduling weighted directed acyclic task graphs, which takes task precedence into account and uses the minimization of parallel time as a cost funct... |

86 |
Optimal Scheduling of Two-Processor Systems
- Coffman, Graham
- 1972
(Show Context)
Citation Context ...ithms. Their conclusion is that the CP heuristic is superior to others since it is near-optimal (i.e. within 5 percent of the optimal in 90 percent of random cases.) Theoretically, Coffman and Graham =-=[4]-=- show that CP is optimal for scheduling DAGs of equal task sizes on two processors. Also Hu [15] shows that CP is optimal for scheduling tree DAGs of equal task sizes on p processors. Previous researc... |

45 | B.: Complexity of scheduling multiprocessor tasks with prespecified processor allocations
- Hoogeveen, Velde, et al.
- 1994
(Show Context)
Citation Context ... also important for other scheduling heuristics and systems. For example, task ordering is useful for HyperTool [23] after its physical mapping stage. This task ordering problem is NP-hard in general =-=[14]-=-. Therefore, the main question is how to devise heuristics with good performance and low computational complexity. Since this problem is closely related to the classical list scheduling, we will first... |

26 | Clustering Task Graphs for Message Passing Architectures - Gerasoulis, Venugopal, et al. - 1990 |

26 | A fast static scheduling algorithm for DAGs on an unbounded number of processors
- Yang, Gerasoulis
- 1991
(Show Context)
Citation Context ...), the delay between task 2 and 10 is 2 2 1 since processor 4 and 0 have a distance of 2 and so is between task 10 and 15. Another comparison to Kruatrachue and Lewis [18]'s DSH heuristic is given in =-=[25]-=- for a tree DAG. For that example PYRROS gives the same parallel time. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Comp=1 Comp=3 Comp=10 Comp=5 Comp=1 Comm=1 P 4 5 6 P P 1 3 4 6 8 5 9 7 2 10 17 11 14... |

18 |
Static task scheduling and grain packing in parallel processing systems
- Kruatrachue
- 1987
(Show Context)
Citation Context ...communication information becomes deterministic. As a result, a better ordering can be derived. Figure 3(c) shows a schedule that is shorter than that of CP and MCP in Figure 2. Kruatrachue and Lewis =-=[18]-=- and El-Rewini and Lewis [8] have proposed one-stage scheduling methods based on task duplication. They have implemented their algorithms in a software scheduling system named TaskGrapher, which is th... |

16 |
A programming aid for hypercube architectures
- Wu, Gajski
- 1988
(Show Context)
Citation Context .... The purpose of this paper is to study algorithms for the third stage, the task ordering. The ordering algorithm is also important for other scheduling heuristics and systems. For example, HyperTool =-=[23]-=- uses the Bokhari's physical mapping algorithm after a processor assignment is derived. It would produce a better solution if task re-ordering was used based on the new physical mapping information. 5... |

14 | Experience with an automatic solution to the mapping problem, in The Characteristics of Parallel Algorithms - BERMAN - 1987 |

11 |
OREGAMI: Software tools for mapping parallel compu- tations to parallel architectures
- Lo, Rajopadhye, et al.
- 1990
(Show Context)
Citation Context ...parallel architectures. PREP-P by Berman [2] is a software system that automatically maps undirected communication graphs onto CHiP machine architectures. The OREGAMI/LaRCS software tool by Lo et al. =-=[19]-=- is used for the mapping of directed task graphs onto processors. We have also built an automatic static scheduling and code generation system called PYRROS [26] for parallel architectures. PYRROS is ... |

10 |
Parallel Gaussian elimination on an MIMD computer
- Cosnard, Marrakchi, et al.
- 1988
(Show Context)
Citation Context ...s the 50% performance guarantee. Only recently considerable attention has been paid to this problem, e.g. Sarkar [22], El-Rewini and Lewis [8], Kim and Browne [17], Wu and Gajski [23], Cosnard et al. =-=[5]-=-, Darte [6] and many others. One of the reasons for the renewed emphasis on scheduling with 3 Supported by Grant No. DMS-8706122 from NSF. communication is that it can be used on message passing MIMD ... |

8 |
A mapping strategy for MIMD computers
- YANG, BIC, et al.
- 1993
(Show Context)
Citation Context ...ension n, v = n 2 =2 and e = n 2 , implying that an O(v 2 ) algorithm will have O(n 4 ) complexity which is higher than a sequential GE algorithm. PYRROS scheduling algorithms use a multistage method =-=[17, 21, 22, 24]-=- that includes clustering and task ordering. In this paper, we will give a brief discussion of the PYRROS approach and analyze in detail the task ordering problem. The task ordering problem is to dete... |

7 | Static scheduling for linear algebra DAGs - Gerasoulis, Nelken - 1989 |

1 |
Two Heuristics for Task Scheduling, Laboratoire LIP-IMAG, Ecole Normale Superieure de
- Darte
- 1991
(Show Context)
Citation Context ...erformance guarantee. Only recently considerable attention has been paid to this problem, e.g. Sarkar [22], El-Rewini and Lewis [8], Kim and Browne [17], Wu and Gajski [23], Cosnard et al. [5], Darte =-=[6]-=- and many others. One of the reasons for the renewed emphasis on scheduling with 3 Supported by Grant No. DMS-8706122 from NSF. communication is that it can be used on message passing MIMD architectur... |

1 |
A static macro-dataflow scheduling tool for scalable parallel architecuteres
- Gerasoulis, Yang
- 1992
(Show Context)
Citation Context ...e well known arithmetic load balancing heuristic. A comparison of the load balancing algorithm and more sophisticated algorithms proposed by Sarkar [22] for this stage is given in Gerasoulis and Yang =-=[12]-=-. If the architecture is not completely connected then a physical mapping that takes into account the network topology is needed. Currently PYRROS uses a modification of Bokhari's [3] mapping algorith... |

1 |
Runtime resource management in concurrent systems
- Ngai
- 1992
(Show Context)
Citation Context ...thod. An explanation of this performance improvement lies in the fact that more accurate task priorities can be computed from one stage to the next. This multistage approach was recently used by Ngai =-=[20]-=- in runtime resource management of concurrent systems, demonstrating its versatility beyond compile-time scheduling. In Figure 3, we show how a multistage methods works. In Figure 3(a) a clustering of... |