## Profile-Driven Instruction Level Parallel Scheduling with Application to Super Blocks (1996)

### Cached

### Download Links

- [www.hpl.hp.com]
- [inferno.lucent.com]
- [ceylon.lcs.mit.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE |

Citations: | 36 - 4 self |

### BibTeX

@INPROCEEDINGS{Chekuri96profile-driveninstruction,

author = {C. Chekuri and R. Johnson and R. Motwani and B. Natarajan and B.R. Rau and M.Schlansker},

title = {Profile-Driven Instruction Level Parallel Scheduling with Application to Super Blocks},

booktitle = {IN PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE},

year = {1996},

pages = {58--67},

publisher = {}

}

### OpenURL

### Abstract

Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research, in light of the increased use of long-instruction-word machines. Unfortunately, optimum scheduling is computationally intractable, and one must resort to carefully crafted heuristics in practice. If the scope of application of a scheduling heuristic is limited to basic blocks, considerable performance loss may be incurred at block boundaries. To overcome this obstacle, basic blocks can be coalesced across branches to form larger regions such as super blocks. In the literature, these regions are typically scheduled using algorithms that are either oblivious to profile information (under the assumption that the process of forming the region has fully utilized the profile information), or use the profile information as an addendum to classical scheduling techniques. We believe that even for the simple case of linear code regions such as super blocks, additional performanc...

### Citations

10964 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...e guarantees do not carry over.) The general problem in which every node has a weight has been shown to be NP-hard even for m = 1 provided we permit arbitrary precedence constraints on the operations =-=[5, 10]-=-. It is polynomially solvable when the precedence graph is a forest [7] or a generalized series-parallel graph [1, 10]. For m ~ 1, the problem is NP-hard even without precedence constraints, unless th... |

624 | Trace scheduling: A technique for global microcode compaction - Fisher - 1981 |

194 | Scheduling to minimize average completion time: off-line and on-line approximation algorithms - Hall, Schulz, et al. - 1997 |

164 | Parallel Sequencing and Assembly Line Problems - Hu - 1961 |

142 | Compiling for the Cydra 5 - Dehnert, Towle - 1993 |

69 |
Sequencing jobs to minimize total weighted completion time subject to precedence constraints. Annals of Discrete Mathematics
- Lawler
- 1978
(Show Context)
Citation Context ...e guarantees do not carry over.) The general problem in which every node has a weight has been shown to be NP-hard even for m = 1 provided we permit arbitrary precedence constraints on the operations =-=[5, 10]-=-. It is polynomially solvable when the precedence graph is a forest [7] or a generalized series-parallel graph [1, 10]. For m ~ 1, the problem is NP-hard even without precedence constraints, unless th... |

51 |
Bounds on multiprocessor timing anomalies
- Graham
- 1969
(Show Context)
Citation Context ...tial schedule as the list, is an approximation algorithm with a performance ratio 2. 7We remark that the preceding theorem is a generalization ofGraham's result for the makespan minimization problem =-=[6]-=- and the proof technique is also similar. However, unlike Graham's result, our theorem does not generalize to the case ofunequal execution times. Using a more sophisticated scheduling algorithm, we ca... |

20 | et al. Effective compiler support for predicated execution using the hyperblock - Mahlke, Lin - 1992 |

19 | Optimal task sequencing with precedence constraints - Garey - 1973 |

19 |
Ordering problems approximated: single-processor scheduling and interval graph completion
- Ravi, Agrawal, et al.
- 1991
(Show Context)
Citation Context ...t can be practical, if the number of such vertices, i.e., the number of branches in the superblock (or hyperblock), is small. >From a theoretical point of view, the best known approximation algorithm =-=[13]-=- for sequential scheduling of weighted DAGS has a performance guarantee of0 (logl n) . 3.1 The Basic Lemma The following terms are defined with respect to a specific schedule S. We use the term segmen... |

18 | Enhancing Instruction Level Parallelism through Compiler-Controlled Speculation - Bringmann - 1995 |

14 |
Single machine job sequencing with precedence constraints
- Adolphson
- 1977
(Show Context)
Citation Context ...n for m = 1 provided we permit arbitrary precedence constraints on the operations [5, 10]. It is polynomially solvable when the precedence graph is a forest [7] or a generalized series-parallel graph =-=[1, 10]-=-. For m ~ 1, the problem is NP-hard even without precedence constraints, unless the weights are all identical in which case it is polynomially solvable; on the other hand, the problem is strongly NP-h... |

14 | Single-machine job sequencing with treelike precedence ordering and linear delay penalties - Horn - 1972 |

12 | et al. The superblock: An effective technique for vliw and superscalar compilation - Hwu, Mahlke - 1993 |

5 |
The superblock: An effective technique for VLIW and superscalar compilation
- Bringamann, Hank, et al.
- 1993
(Show Context)
Citation Context ...ies. To overcome this limitation, basic blocks that are separated by branch instructions can be combined into a bigger blocks. Two recent successful techniques are superblock formation and scheduling =-=[9]-=-, and predicated execution using hyperblocks [11]. Both superblocks and hyperblocks have a single entry multiple exit property and each branch is a conditional exit. We can allocate probabilities to e... |

2 | Global code generation for instruction level parallelism - Fisher - 1993 |

1 | Schedulingchain structured operations to minimize makespan and mean flow time - Du, Leung, et al. - 1991 |

1 | ExploitingInstruction Level Parallelism in the Presence of Conditional Branches - Mahlke - 1996 |

1 |
Scheduling chain structured operations to minimize makespan and mean flow time
- Du, Leung, et al.
- 1991
(Show Context)
Citation Context ... are all identical in which case it is polynomially solvable; on the other hand, the problem is strongly NP-hard even when all weights are identical and the precedence graph is a collection of chains =-=[3]-=-. In light of the intractable nature of the problem, we adopt the standard approach for dealing with NP-hard problems [5, 12], i.e., the design of approximation algorithms with a bounded performance r... |

1 |
Single-machine job sequencing with treelike precedence ordering and linear delay penalties
- Hom
- 1972
(Show Context)
Citation Context ...s a weight has been shown to be NP-hard even for m = 1 provided we permit arbitrary precedence constraints on the operations [5, 10]. It is polynomially solvable when the precedence graph is a forest =-=[7]-=- or a generalized series-parallel graph [1, 10]. For m ~ 1, the problem is NP-hard even without precedence constraints, unless the weights are all identical in which case it is polynomially solvable; ... |

1 |
Approximation Algorithms (Volume I
- Motwani
- 1992
(Show Context)
Citation Context ...n all weights are identical and the precedence graph is a collection of chains [3]. In light of the intractable nature of the problem, we adopt the standard approach for dealing with NP-hard problems =-=[5, 12]-=-, i.e., the design of approximation algorithms with a bounded performance ratio. The performance ratio of an approximation 2algorithm is defined as the worst-case ratio ofthe cost ofthe approximate s... |