Results 11 - 20
of
39
Using Iterative Compilation for Managing Software Pipeline-Unrolling Trade-offs
, 1999
"... Traditional optimizing compilers for embedded applications usually only optimize for speed or code size but never try to search for a trade-off between these two issues. We propose a different compilation strategy, called iterative compilation, to achieve this. Iterative compilation explores multipl ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Traditional optimizing compilers for embedded applications usually only optimize for speed or code size but never try to search for a trade-off between these two issues. We propose a different compilation strategy, called iterative compilation, to achieve this. Iterative compilation explores multiple sets of optimizing code transformations and a high degree of control between high and low-level optimizations. Feedback from the low-level transformations and the ability to undo high-level choices are the key features of our approach. Our preliminary results show that iterative compilation has the potential to outperform traditional compilers by making clever decisions and considering trade-offs.
Speculative Prefetching of Induction Pointers
, 2001
"... . We present an automatic approach for prefetching data for linked list data structures. The main idea is based on the observation that linked list elements are frequently allocated at constant distance from one another in the heap. When linked lists are traversed, a regular pattern of memory ac ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
. We present an automatic approach for prefetching data for linked list data structures. The main idea is based on the observation that linked list elements are frequently allocated at constant distance from one another in the heap. When linked lists are traversed, a regular pattern of memory accesses with constant stride emerges. This regularity in the memory footprint of linked lists enables the development of a prefetching framework where the address of the element accessed in one of the future iterations of the loop is dynamically predicted based on its previous regular behavior. We automatically identify pointer-chasing recurrences in loops that access linked lists. This identication uses a surprisingly simple method that looks for induction pointers | pointers that are updated in each loop iteration by a load with a constant oset. We integrate induction pointer prefetching with loop scheduling. A key intuition incorporated in our framework is to insert prefetches ...
Compilation Techniques for Parallel Systems
- PARALLEL COMPUTING
, 1999
"... Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through which parallelism is detected and expressed in form of a program representation. Next compilation techniques for scheduling instruction level parallelism are discussed along with the relationship between the nature of compiler support and type of processor architecture. Compilation techniques for exploiting loop and task level parallelism on shared memory multiprocessors are summarized. Locality optimizations that must be used in conjunction with parallelization techniques for achieving high performance on machines with complex memory hierarchies are also discussed. Finally we provide an...
Modulo Scheduling for Control-Intensive General-Purpose Programs
, 1997
"... It is increasingly necessary for the compiler to overlap successive loop iterations in order to nd su cient instruction-level parallelism to e ectively utilize the resources of high-performance processors. Two competing methods have been developed for moving instructions across itera-tion boundaries ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
It is increasingly necessary for the compiler to overlap successive loop iterations in order to nd su cient instruction-level parallelism to e ectively utilize the resources of high-performance processors. Two competing methods have been developed for moving instructions across itera-tion boundaries: unrolling followed by global acyclic scheduling and software pipelining. This dissertation investigates modulo scheduling, a software pipelining technique. Much of the pre-vious work on modulo scheduling has targeted the relatively well-behaved loops in numeric programs. This dissertation develops new techniques that allow modulo scheduling to be ef-fectively applied to control-intensive non-numeric programs. These techniques overcome the restrictions imposed by problematic control ow and loop exits. This dissertation also demonstrates that unrolling-based optimization prior to scheduling improves the performance of modulo scheduled loops and is, in fact, necessary to allow modulo scheduling to surpass the performance of acyclic scheduling for control-intensive general-purpose programs. Modulo scheduling has the following advantages over the acyclic scheduling approach for control-intensive general-purpose programs. First, modulo scheduling increases performance by maintaining the overlap of loop iterations throughout the execution of the loop. Second,
Optimal And Near-Optimal Solutions For Hard Compilation Problems
, 1998
"... An optimizing compiler typically uses multiple program representations at different levels of program and performance abstractions in order to be able to perform transformations that -- at least in the majority of cases -- will lead to an overall improvement in program performance. The complexities ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
An optimizing compiler typically uses multiple program representations at different levels of program and performance abstractions in order to be able to perform transformations that -- at least in the majority of cases -- will lead to an overall improvement in program performance. The complexities of the program and performance abstractions used to formulate compiler optimization problems have to match the complexities of the high--level programming model and of the underlying target system. Scalable parallel systems typically have multi--level memory hierarchies and are able to exploit coarse--grain and fine--grain parallelism. Most likely, future systems will have even deeper memory hierarchies and more granularities of parallelism. As a result, future compiler optimizations will have to use more and more complex, multi--level computation and performance models in order to keep up with the complexities of their future target systems. Most of the optimization problems encountered in ...
Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures
- In Proc. of the International Workshop on Advanced Compiler Technology for High Performance and Embedded Systems (IWACT
, 2001
"... High-performance microprocessors are currently designed with the purpose of exploiting the inherent instruction level parallelism (ILP) available in applications. The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
High-performance microprocessors are currently designed with the purpose of exploiting the inherent instruction level parallelism (ILP) available in applications. The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. In this paper we overview some hardware and software techniques proposed in the literature to alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be avoided if a high--capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation) . Future scalable VLIW cores will require the use of clustering to decentralize the design and to meet the technology constraints. New aggressive instruction scheduling techniques will be required to minimize the negative effect of this resource clustering and delays to move data around. Keywords--- Modulo scheduling, Register requirements, Spill code, Register file organization, clustered organization. I.
An Integer Linear Programming Model of Software Pipelining for the MIPS R8000 Processor
- In PaCT’97, Parallel Computing Technologies, 4th International Conference
, 1997
"... . In parallelizing the code for high-performance processors, software pipelining of innermost loops is of fundamental importance. In order to benefit from software pipelining, two separate tasks need to be performed: (i) software pipelining proper (find the rate-optimal legal schedule), and (ii) reg ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. In parallelizing the code for high-performance processors, software pipelining of innermost loops is of fundamental importance. In order to benefit from software pipelining, two separate tasks need to be performed: (i) software pipelining proper (find the rate-optimal legal schedule), and (ii) register allocation (allocate registers to the found schedule). Software pipelining and register allocation can be formulated as an integer linear programming (ILP) problem, aiming to produce optimal schedules. In this paper, we discuss the application of the integer linear programming to software pipelining on the MIPS R8000 superscalar microprocessor. Some of the results were presented in the PLDI96 [14], where they were compared to the MIPSpro software pipeliner. In this paper we further extend the ILP model for the MIPS R8000 by including memory optimization and present the entire model in detail. 1 Introduction In the recent years, the concept of instruction-level parallelism played a cen...
Efficient State-Diagram Construction Methods for Software Pipelining
- In Proc. of the 8th Intl. Conf. on Compiler Construction
, 1999
"... State diagram based approach has been proposed as an effective way to model resource constraints in traditional instruction scheduling and software pipelining methods. However, the constructed state diagram for software pipelining method (i) is very large and (ii) contains significant amount of r ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
State diagram based approach has been proposed as an effective way to model resource constraints in traditional instruction scheduling and software pipelining methods. However, the constructed state diagram for software pipelining method (i) is very large and (ii) contains significant amount of replicated, and hence redundant, information on legal latency sequences. As a result, the construction of state diagrams can take very large computation time. For example, in modeling the resource constraints of the DEC Alpha 21064 processor, it took more than 24 hours on a 250MHz high-performance workstation to construct, say, 100,000 distinct latency sequences. In another experiment, out of the 224,400 latency sequences generated, only 30 were distinct. These make state diagram based approach impractical in real compiler implementation. In this paper, we propose two methods for the efficient construction of state diagrams. In the first method, we relate the construction of state dia...
An Experimental Study of Algorithms for Weighted Completion Time Scheduling
- Algorithmica
, 2002
"... We consider the total weighted completion time scheduling problem for parallel identical machines and precedence constraints, P jprecj P w i C i . This important and broad class of problems is known to be NP-hard, even for restricted special cases, and the best known approximation algorithms hav ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We consider the total weighted completion time scheduling problem for parallel identical machines and precedence constraints, P jprecj P w i C i . This important and broad class of problems is known to be NP-hard, even for restricted special cases, and the best known approximation algorithms have worstcase performance that is far from optimal. However, little is known about the experimental behavior of algorithms for the general problem. This paper represents the first attempt to comprehensively describe and evaluate a range of weighted completion time scheduling algorithms.
Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining
- In Proc. of Workshop on Compilers and Operating Systems for Low Power (COLP-2002
, 2002
"... Increasing power consumption in high performance processors and the proliferation of embedded systems demand new compiler techniques geared toward both high performance and low power. Software pipelining, an effective compiler optimization to exploit instruction level parallelism across loop iterati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Increasing power consumption in high performance processors and the proliferation of embedded systems demand new compiler techniques geared toward both high performance and low power. Software pipelining, an effective compiler optimization to exploit instruction level parallelism across loop iterations, has been studied extensively. However, previous software pipelining methods focus on performance only. This paper presents a software pipelining method that reduces power consumption while keeping performance optimality. This is accomplished as schedule slacks exist for non-critical instructions even in performance optimal schedules, and by exploiting the slack appropriately, it may be possible to reduce the number of functional units used in the schedule.

