Results 1 - 10
of
22
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent t ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess-ing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis-calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen-tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul-tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the
Code Generation Schema for Modulo Scheduled Loops
- in Proceedings of the 25th Annual International Symposium on Microarchitecture
, 1992
"... Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. Modulo scheduling is one approach for generating such schedules. This paper addresses an issue which has received little attention thus far, ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. Modulo scheduling is one approach for generating such schedules. This paper addresses an issue which has received little attention thus far, but which is non-trivial in its complexity: the task of generating correct, high-performance code once the modulo schedule has been generated, taking into account the nature of the loop and the register allocation strategy that will be used. This issue is studied both with and without hardware features that are specifically aimed at supporting modulo scheduling.
Enhanced Modulo Scheduling for Loops with Conditional Branches
- In Proceedings of the 25th Annual International Symposium on Microarchitecture
, 1992
"... Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an En ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an Enhanced Modulo Scheduling (EMS) technique that can achieve a lower minimum Initiation Interval than modulo scheduling techniques that rely on either Hierarchical Reduction or If-conversion with Predicated Execution. These three modulo scheduling techniques have been implemented inaprototype compiler. We show that for existing architectures which support one branch per cycle, EMS performs approximately 18 % better than Hierarchical Reduction. We also show that If-conversion with Predicated Execution outperforms EMS assuming one branch per cycle. However, with hardware support for multiple branches per cycle, EMS should perform as well as or better than If-conversion with Predicated Execution. 1
Decomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs
- in IFIP
, 1993
"... This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of one-dimension to a matrix of two-dimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to deter ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of one-dimension to a matrix of two-dimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to determine the row-numbers of operations in the matrix and another is to determine the column-numbers. Using this view-point as a basis, we develop a new loop scheduling approach, called decomposed software pipelining. Keywords: Loop Scheduling; Instruction Level Parallelism; Software Pipelining; Loop-carried Dependence; Problem Decomposition 1 Introduction Since loop execution dominates total execution time of almost all practical programs, the exploitation of instruction level parallelism for loops is a major challenge in the design of optimizing compilers for high-performance computers such as VLIW, superscalar and pipelined processors [Alm89]. Software pipelining is an effective instruction ...
Modulo Scheduling With Isomorphic Control Transformations
, 1994
"... ... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierar ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierarchical Reduction. Modulo Scheduling with ICTs targets processors with no or limited support for conditional execution such as superscalar processors. However, in processors that do not require instruction set compatibility, support for Predicated Execution can be used. This dissertation shows that Modulo Scheduling with Predicated Execution has better performance and lower code expansion than Modulo Scheduling with ICTs on processors without special hardware support.
Profile-Assisted Instruction Scheduling
- International Journal of Parallel Programming
, 1994
"... Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism for the frequent execution scenarios at the expense of the less frequent ones. Profile information identifies these important execution scenarios in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. These techniques are acyclic global scheduling and software pipelining. This paper describes the scheduling algorithms, highlights the modifications required to use profile information, and explains the hardware and compiler support fo...
The benefit of Predicated Execution for software pipelining
- IN PROCEEDINGS OF THE 23RD HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES
, 1993
"... Software pipelining is a compile-time scheduling technique that overlaps successive loop iterations to expose operation-level parallelism. An important problem with the development of effective software pipelining algorithms is how to handle loops with conditional branches. Conditional branches incr ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Software pipelining is a compile-time scheduling technique that overlaps successive loop iterations to expose operation-level parallelism. An important problem with the development of effective software pipelining algorithms is how to handle loops with conditional branches. Conditional branches increase the complexity and decrease the effectiveness of software pipelining algorithms by introducing many possible execution paths into the scheduling scope. This paper presents an empirical study of the importance of an architectural support, referred to as predicated execution, on the effectiveness of software pipelining. In order to perform an in-depth analysis, we focus on Rau's modulo scheduling algorithm for software pipelining. Three versions of the modulo scheduling algorithm, one with and two without predicated execution support, are implemented in a prototype compiler. Experiments based on important loops from numeric applications show that predicated execution support substantially improves the effectiveness of the modulo scheduling algorithm.
Reducing The Impact Of Register Pressure On Software Pipelined Loops
, 1996
"... This work deals with the problems caused by the high register requirements of software pipelined loops. The main contributions of this work are: * Register requirements of software pipelined loops are evaluated. * Several heuristics to perform register-constrained software pipelining are proposed * ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
This work deals with the problems caused by the high register requirements of software pipelined loops. The main contributions of this work are: * Register requirements of software pipelined loops are evaluated. * Several heuristics to perform register-constrained software pipelining are proposed * The effects of register requirements on performance under register constraints are evaluated * HRMS is proposed to perform software pipelining with resource constraints and reduced register requirements * Two new register file organizations are proposed to allow for a large number of registerse with low area cost and fast access time.

