Results 1 - 10
of
37
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
Code Generation Schema for Modulo Scheduled Loops
- in Proceedings of the 25th Annual International Symposium on Microarchitecture
, 1992
"... Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. Modulo scheduling is one approach for generating such schedules. This paper addresses an issue which has received little attention thus far, ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. Modulo scheduling is one approach for generating such schedules. This paper addresses an issue which has received little attention thus far, but which is non-trivial in its complexity: the task of generating correct, high-performance code once the modulo schedule has been generated, taking into account the nature of the loop and the register allocation strategy that will be used. This issue is studied both with and without hardware features that are specifically aimed at supporting modulo scheduling.
Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining
, 1995
"... The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract
-
Cited by 73 (13 self)
- Add to MetaCart
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rate-optimal) while minimizing the number of buffers --- a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
Enhanced Modulo Scheduling for Loops with Conditional Branches
- In Proceedings of the 25th Annual International Symposium on Microarchitecture
, 1992
"... Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an En ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
Loops with conditional branches have multiple execution paths which are di cult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an Enhanced Modulo Scheduling (EMS) technique that can achieve a lower minimum Initiation Interval than modulo scheduling techniques that rely on either Hierarchical Reduction or If-conversion with Predicated Execution. These three modulo scheduling techniques have been implemented inaprototype compiler. We show that for existing architectures which support one branch per cycle, EMS performs approximately 18 % better than Hierarchical Reduction. We also show that If-conversion with Predicated Execution outperforms EMS assuming one branch per cycle. However, with hardware support for multiple branches per cycle, EMS should perform as well as or better than If-conversion with Predicated Execution. 1
A Novel Framework of Register Allocation for Software Pipelining
, 1993
"... ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 869-0481, or (permissions@acm.org). Qi Ning Guang R. Gao School of Com ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 869-0481, or (permissions@acm.org). Qi Ning Guang R. Gao School of Computer Science McGill University Montreal, Quebec Canada H3A 2A7 email: ning@cs.mcgill.ca gao@cs.mcgill.ca Abstract Although software pipelining has been proposed as one of the most important loop scheduling methods, simultaneous scheduling and register allocation is less understood and remains an open problem [28]. The objective of this paper is to develop a unified algorithmic framework for concurrent scheduling and register allocation to support time-optimal software pipelining. A key intuition leading to this surprisingly simple formulation and its efficient solution is the association of maximum computation rate of a program graph with its critical cycles due to Reiter's pioneering work...
Software pipelining showdown: Optimal vs. heuristic methods in a production compiler
- In Proc. of the ACM SIGPLAN'96 Conf. on Programming Languages Design and Implementation
, 1996
"... This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneere ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite dissimilar. One technique was developed at Silicon Graphics and is used in the MIPSpro compiler. This is the production compiler for SG1’S systems which are based on the MIPS R8000 processor [Hsu94]. It is essentially a branch-and-bound enumeration of possible sched-ules with extensive pruning. This method is heuristic becaus(s of the way it prunes and also because of the interaction between reg-ister allocation and scheduling. The second technique aims to produce optimal results by formulat-
Resource-Constrained Software Pipelining
- Advances in Languages and Compilers for Parallel Processing, Res. Monographs in Parallel and Distrib. Computing, chapter 14
, 1995
"... This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, general ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented. 1 Introduction Recently there has been considerable interest in a class of compiler parallelization techniques known collectively as software pipelining. Software pipelining algorithms compute a static parallel schedule overlapping the operations of a loop body in much the same way that a hardware pipeline overlaps operations in a dynamic instruction stream. The schedule computed by a software pipelining algorithm is suitable for execution on a ...
A Polynomial Time Method for Optimal Software Pipelining
- In Proc. of the Conf. on Vector and Parallel Processing, CONPAR-92, number 634 in Lec. Notes in Comp. Sci
, 1992
"... Software pipelining is one of the most important loop scheduling methods used by parallelizing compilers. It determines a static parallel schedule -- a periodic pattern -- to overlap instructions of a loop body from different iterations. The main contributions of this paper are the following: First, ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
Software pipelining is one of the most important loop scheduling methods used by parallelizing compilers. It determines a static parallel schedule -- a periodic pattern -- to overlap instructions of a loop body from different iterations. The main contributions of this paper are the following: First, we propose to express the fine-grain loop scheduling problem (in particular, software pipelining) on the basis of the mathematical formulation of r-periodic scheduling. This formulation overcomes some of the problems encountered by existing software pipelining methods. Second, we demonstrate the feasibility of the proposed method by (1) presenting a polynomial time algorithm to find an optimal schedule in this r-periodic form that maximizes the computation rate (in fact, we show that this schedule maximizes the computation rate theoretically possible), and by (2) establishing polynomial bounds for the optimal schedule, i.e. bounds on its period, its periodicity, the pattern size, and the c...
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
- IEEE Transactions on Parallel and Distributed Systems
, 1996
"... The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract
-
Cited by 25 (10 self)
- Add to MetaCart
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rate-optimal) while minimizing the number of buffers --- a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...

