#### DMCA

## Software pipelining: An effective scheduling technique for VLIW machines (1988)

### Cached

### Download Links

- [suif.stanford.edu]
- [pages.cs.wisc.edu]
- [pages.cs.wisc.edu]
- [ag-kastens.uni-paderborn.de]
- [pages.cs.wisc.edu]
- [pages.cs.wisc.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 578 - 3 self |

### Citations

679 | Trace scheduling: A technique for global microcode compaction - Fisher - 1981 |

559 |
Algorithm 97, Shortest path
- Floyd
- 1962
(Show Context)
Citation Context ... the strongly con&ted components in the graph [29]. and compute the closure of the precedence constraints in each COnneCted component by solving the all-points longest path Problem for each component =-=[6,13]-=-. This information is used in the iterative scheduling step. To avoid the cost of recomputing this information for each value of the initiation interval, we compute this information only once in the p... |

264 | Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing
- Rau, Glaeser
- 1981
(Show Context)
Citation Context ...oss basic blocks [ 121. In fact, the VLIW architecture is developed from the study of the global code compaction technique, trace scheduling [lo]. The thesis of this paper is that software pipelining =-=[24,25,30]-=- is a viable alternative technique for scheduling VLIW processors. In software pipelining, iterations of a loop in a source program are continuously initiated at constant intervals without having to w... |

198 | A VLIW architecture for a trace scheduling compiler - Colwell, Nix, et al. - 1988 |

100 |
A Systolic Array Optimizing Compiler
- Lam
- 1987
(Show Context)
Citation Context ...linear search instead. The rationale is as follows: Although the probability that a schedule can be found generally increases with the value of the initiation interval, schedulabiity is not monotonic =-=[21]-=-. Especially since empirical results show that in the case of Warp, a schedule meeting the lower bound can often be found, sequential search is preferred. A lower bound on the initiation interval can ... |

67 |
A compilation technique for software pipelining of loops with conditional jumps
- Ebcioglu
- 1987
(Show Context)
Citation Context ...the schedules for the iterations are given and cannot be changed. Ebcioglu proposed a software pipelining algorithm to generate code for a hypothetical machine with infinitely many hardware resouruzs =-=[7]-=-. Lastly, Weiss and Smith compared the results of using loop unrolling and software pipelining to generate scalar code for the Cray-1s architecture [31]. 319 However, their software pipeliig algorithm... |

55 |
A study of scalar compilation techniques for pipelined supercomputers
- Weiss, Smith
- 1987
(Show Context)
Citation Context ... machine with infinitely many hardware resouruzs [7]. Lastly, Weiss and Smith compared the results of using loop unrolling and software pipelining to generate scalar code for the Cray-1s architecture =-=[31]-=-. 319 However, their software pipeliig algorithm only overlaps the computation from at most two iterations. The unfavorable results obtained for software pipelining can be attributed to the particular... |

37 |
Improving the throughput of a pipeline by insertion of delays
- Patel, Davidson
- 1976
(Show Context)
Citation Context ...oss basic blocks [ 121. In fact, the VLIW architecture is developed from the study of the global code compaction technique, trace scheduling [lo]. The thesis of this paper is that software pipelining =-=[24,25,30]-=- is a viable alternative technique for scheduling VLIW processors. In software pipelining, iterations of a loop in a source program are continuously initiated at constant intervals without having to w... |

35 | Computers and Intractability: A Guide to the Theory ofNP-Completeness - GAREY, JOHNSON - 1979 |

32 |
The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling
- FISHER
- 1979
(Show Context)
Citation Context ...se to schedule acyclic graphs for .a target initiation interval is the same as that used in the FPS compiler, which itself is derived from the list scheduling algorithm used in basic block scheduling =-=[9]-=-. List scheduling is a non-backtracking algorithm, nodes are scheduled in a topological ordering, and are placed in the earliest possible time slot that satisfies all scheduling constraints with the p... |

26 | URPR – an extension of URCR for software pipelining - SU, DING, et al. - 1986 |

16 | Compilation for a High-performance Systolic Array - Gross, Lam - 1986 |

8 |
Global optimization of microprograms through modular control constructs
- Wood
- 1979
(Show Context)
Citation Context ...nstruct. The scheduling process is complete when the entire program is reduced to a single node. The hierarchical reduction technique is derived from the scheduling scheme previously proposed by Wood =-=[32]-=-. In Wood’s approach, scheduled constructs are modeled BS black boxes taking unit time. Operations outside the construct can move around it but cannot execute concurrently with it. Here, the resource ... |

5 | Global compaction of horizontal microprograms based on the generalized data dependency graph - Isoda, Kobayashi, et al. - 1983 |

1 |
Perfect Pipelining: A New Loop Parallelixation Technique
- A, Nicolau
- 1987
(Show Context)
Citation Context ...e shown by transforming the problem of resource constrained scheduling problem [ 141 to the software pipelining problem). There have been two approaches in response to the complexity of this problem: =-=(1)-=- change the architecture, and thus the characteristics of the constraints, so that the problem becomes tractable, and (2) use heuristics. The first approach is used in the polycyclic [25] and Cydrome’... |

1 |
All Shortest Routes from a Fixed Origin in a Graph. ‘Theory of Graphs
- Rao
- 1967
(Show Context)
Citation Context ... the strongly con&ted components in the graph [29]. and compute the closure of the precedence constraints in each COnneCted component by solving the all-points longest path Problem for each component =-=[6,13]-=-. This information is used in the iterative scheduling step. To avoid the cost of recomputing this information for each value of the initiation interval, we compute this information only once in the p... |

1 | R Bulldog: A Compilerfor VLJW Architectures - John - 1985 |

1 | Parallel Processing: A Smart Compiler and a Dumb Machine - C, Nieolau - 1984 |

1 | Microcode Compaction: Locking Backward and Looking - D - 1981 |

1 |
Highly Concurrent Scalar Processing
- Peter
- 1986
(Show Context)
Citation Context ... However, this hardware feature is expensive; and, when interiteration dependency is present in a loop, exhaustive search on the strongly connected components of the data flow graph is still necessary=-=[16]-=-. The second approach is used in the FPS-164 compiler [30]. Software pipelining is applied to a restricted set of loops, namely those containing a single Fortran statement. In other words, at most one... |

1 |
Dependence Graphs and Compiler Optimimtions
- J, Kuhn, et al.
- 1981
(Show Context)
Citation Context ...s optimization of allocating multiple registers to a variable in the loop module variable eqansion. This op timization is a variation of the variable expansion technique used in vectorizing compilers =-=[18]-=-. The variable expansion transformation identifies those variables that are redefmed at the beginning of every iteration of a loop, and expands the variable into a higher dimension variable, so that e... |

1 | Tree Compaction of Microprograms - J, Atkin - 1982 |

1 | Compiler Optimimtions for Asynchronous Systolic Army Programs - Lam - 1988 |

1 | SRDAG Compaction - A Generalization of Trace Scheduling to Increase the Use of Global Context Information - Joseph - 1983 |

1 |
An Improvement of Trace Scbeduling for Global Microcode Compaction
- Su, Ding, et al.
- 1984
(Show Context)
Citation Context ...n be controlled by inserting additional constraints between branching operations. For example. Su et al. suggested restricting the motions of operations that are not on the critical path of the trace =-=[26]-=-. In our approach to scheduling conditional statements, the objective is to minimize the effect of conditional statements on parallel execution of other constructs. By modeling the conditional stateme... |

1 | GURPR- A Method for Global Software P@lining - Su, Ding, et al. - 1987 |

1 |
Depth fint search and linear graph algorithms
- Tarjan
- 1972
(Show Context)
Citation Context ...hs. A scheduling algoritlun for cyclic graphs that satisfies these properties is presented below. The following preprocessing step is fust performed: find the strongly con&ted components in the graph =-=[29]-=-. and compute the closure of the precedence constraints in each COnneCted component by solving the all-points longest path Problem for each component [6,13]. This information is used in the iterative ... |

1 |
A Fortran Compiler for the FPS-164 Scientific Computer
- Touxeau
- 1984
(Show Context)
Citation Context ...oss basic blocks [ 121. In fact, the VLIW architecture is developed from the study of the global code compaction technique, trace scheduling [lo]. The thesis of this paper is that software pipelining =-=[24,25,30]-=- is a viable alternative technique for scheduling VLIW processors. In software pipelining, iterations of a loop in a source program are continuously initiated at constant intervals without having to w... |