Results 1 - 10
of
10
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Enhanced Modulo Scheduling for Loops with Conditional Branches
- In Proceedings of the 25th Annual International Symposium on Microarchitecture
, 1992
"... Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an En ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an Enhanced Modulo Scheduling (EMS) technique that can achieve a lower minimum Initiation Interval than modulo scheduling techniques that rely on either Hierarchical Reduction or If-conversion with Predicated Execution. These three modulo scheduling techniques have been implemented inaprototype compiler. We show that for existing architectures which support one branch per cycle, EMS performs approximately 18 % better than Hierarchical Reduction. We also show that If-conversion with Predicated Execution outperforms EMS assuming one branch per cycle. However, with hardware support for multiple branches per cycle, EMS should perform as well as or better than If-conversion with Predicated Execution. 1
Reverse If-Conversion
- in Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation
, 1993
"... In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse I ..."
Abstract
-
Cited by 61 (8 self)
- Add to MetaCart
In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse If-Conversion (RIC), that transforms scheduled If-Converted code back to the control flow graph representation. This paper presents the predicate internal representation, the algorithms for RIC, and the correctness of RIC. In addition, the scheduling issues are addressed and an application to software pipelining is presented. 1 Introduction Compilers for processors with instruction level parallelism hardware need a large pool of operations to schedule from. In processors without support for conditional execution, branches present a scheduling barrier that limits the pool of operations to the basic block. Since basic blocks tend to have only a few operations, global scheduling techniques are ...
Modulo Scheduling With Isomorphic Control Transformations
, 1994
"... ... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierar ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierarchical Reduction. Modulo Scheduling with ICTs targets processors with no or limited support for conditional execution such as superscalar processors. However, in processors that do not require instruction set compatibility, support for Predicated Execution can be used. This dissertation shows that Modulo Scheduling with Predicated Execution has better performance and lower code expansion than Modulo Scheduling with ICTs on processors without special hardware support.
The impact of if-conversion and branch prediction on program execution on the intel itanium processor
- In MICRO-34
, 2001
"... The research community has studied if-conversion for many years. However, due to the lack of existing hardware, studies were conducted by simulating code generated by experimental compilers. This paper presents the first comprehensive study of the use of predication to implement if-conversion on pro ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The research community has studied if-conversion for many years. However, due to the lack of existing hardware, studies were conducted by simulating code generated by experimental compilers. This paper presents the first comprehensive study of the use of predication to implement if-conversion on production hardware with a near-production compiler. To better understand trends in the measurements, we generated binaries at three increasing levels of if-conversion aggressiveness. For each level, we gathered data regarding the global runtime effects of if-conversion on overall execution time, register pressure, code size, and branch behavior. Furthermore, we studied the inherent characteristics of program control-flow
Profile-Assisted Instruction Scheduling
- International Journal of Parallel Programming
, 1994
"... Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism for the frequent execution scenarios at the expense of the less frequent ones. Profile information identifies these important execution scenarios in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. These techniques are acyclic global scheduling and software pipelining. This paper describes the scheduling algorithms, highlights the modifications required to use profile information, and explains the hardware and compiler support fo...
Using Profile Information to Assist Advanced Compiler Optimization and Scheduling
"... Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the p ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism along the frequent execution scenario at the expense of the less frequent execution sequences. Profile information identifies these important execution sequences in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-based transformations have been incorporated into the IMPACT compiler. These transformations include global optimization, acyclic global scheduling, and software pipelining. The effectiveness of these profile-based techniques is evaluated for a range of superscalar and VLIW processors.
Software Bubbles: Using Predication to Compensate for Aliasing in Software Pipelines
- In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2002. [MAB + 94
, 2002
"... This paper describes a technique for utilizing predication to support software pipelining on EPIC architectures in the presence of dynamic memory aliasing. The essential idea is that the compiler generates an optimistic software-pipelined schedule that assumes there is no memory aliasing. The ope ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes a technique for utilizing predication to support software pipelining on EPIC architectures in the presence of dynamic memory aliasing. The essential idea is that the compiler generates an optimistic software-pipelined schedule that assumes there is no memory aliasing. The operations in the pipeline kernel are predicated, however, so that if memory aliasing is detected by a run-time check, the predicate registers are set to disable the iterations that are so tightly overlapped as to violate the memory dependences. We refer to these disabled kernel operations as software bubbles.
Pro le-assisted instruction scheduling
- International Journal for Parallel Programming
, 1994
"... Instruction schedulers for superscalar and VLIW processors must expose su cient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Instruction schedulers for superscalar and VLIW processors must expose su cient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism for the frequent execution scenarios at the expense of the less frequent ones. Pro le information identi es these important execution scenarios in a program. In this paper, two major categories of pro le information are studied: control- ow and memory-dependence. Pro le-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. These techniques are acyclic global scheduling and software pipelining. This paper describes the scheduling algorithms, highlights the modi cations required to use pro le information, and explains the hardware and compiler support for dealing with hazards that arise from aggressive use of pro le information. The e ectiveness of these pro le-based scheduling techniques is evaluated for a range of superscalar and VLIW processors. 1
TABLE OF CONTENTS
"... Iwould like to thank Professor Wen-mei Hwu for all of the time he has taken to advise me in both academics and research, from writing recommendation letters and suggesting coursework to helping me become involved with this project. I would also like to thank Nancy Warter for her guidance throughout ..."
Abstract
- Add to MetaCart
Iwould like to thank Professor Wen-mei Hwu for all of the time he has taken to advise me in both academics and research, from writing recommendation letters and suggesting coursework to helping me become involved with this project. I would also like to thank Nancy Warter for her guidance throughout my research. Finally, Iwant to thank my ancee, Lynn Morstadt, for all of the support she has given in motivating me to complete

