Results 1 - 10
of
11
Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers
- In PACT `94: International Conference on Parallel Architectures and Compilation Techniques
, 1994
"... We present Resource Spackling, a framework for integrating register allocation and instruction scheduling that is based on a Measure and Reduce paradigm. The technique first measures the resource requirements of a program and then uses these measurements to distribute code for better resource alloca ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
We present Resource Spackling, a framework for integrating register allocation and instruction scheduling that is based on a Measure and Reduce paradigm. The technique first measures the resource requirements of a program and then uses these measurements to distribute code for better resource allocation. The technique is general in that it is applicable to both local and global scheduling and the allocation of different types of resources. A program's resource requirements for both register and functional unit resources are first measured using a unified representation. These measurements are used to find areas where resources are either under-utilized or over-utilized, called resource holes and excessive sets, respectively. Conditions are determined for increasing resource utilization in the resource holes. A local scheduler that moves sets of instructions into resource holes to reduce the excessive sets for all resources is presented. We develop a global scheduling algorithm that mo...
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
- IEEE Transactions on Computers
, 2003
"... Abstract — In this paper we address the problem of generating an optimal ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract — In this paper we address the problem of generating an optimal
Phase-Coupled Mapping of Data Flow Graphs to Irregular Data Paths
- Embedded Systems
, 1999
"... . Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
. Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable the use of high-level language compilers also for embedded software, new code generation and optimization techniques are required. This paper describes a novel code generation technique for embedded processors with irregular data path architectures, such as typically found in fixed-point DSPs.The proposed code generation technique maps data flow graph representation of a program into highly efficient machine code for a target processor modeled by instruction set behavior. High code quality is ensured by tight coupling of different code generation phases. In contrast to earlier works, mainly based on heuristics, our approach is constraint-based. An initial set of constraints on code generati...
Using Integer Linear Programming for Instruction Scheduling and Register Allocation in Multi-issue Processors
- in Multi-Issue Processors. Computers and Mathematics with Applications
, 1997
"... Instruction scheduling and register allocation are two very important optimizations in modern compilers for advanced processors. These two optimizations must be performed simultaneously in order to maximize the instruction-level parallelism and to fully utilize the registers [11]. In this paper w ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Instruction scheduling and register allocation are two very important optimizations in modern compilers for advanced processors. These two optimizations must be performed simultaneously in order to maximize the instruction-level parallelism and to fully utilize the registers [11]. In this paper we solve register allocation and instruction scheduling simultaneously using integer linear programming (ILP). We have successfully worked out the ILP formulations for the problem with and without register spilling. Two kinds of optimizations are considered: (1) Fix the number of free registers and then solve for the minimum number of cycles to execute the instructions, or (2) fix the maximum execution cycles for the instructions and solve for the minimum number of registers needed. Besides being theoretically interesting, our solution serves as a reference point for other heuristic solutions. The formulations are also applicable to high-level synthesis of ASICs and designs for embedde...
Scheduling Expression DAGs for Minimal Register Need
, 1998
"... Generating schedules for expression DAGs that use a minimal number of registers is a classical NP--complete optimization problem. Up to now an exact solution could only be computed for small DAGs (with up to 20 nodes), using a trivial O(n!) enumeration algorithm. We present a new algorithm with ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Generating schedules for expression DAGs that use a minimal number of registers is a classical NP--complete optimization problem. Up to now an exact solution could only be computed for small DAGs (with up to 20 nodes), using a trivial O(n!) enumeration algorithm. We present a new algorithm with worst--case complexity O(n2 2n ) and very good average behaviour. Applying a dynamic programming scheme and reordering techniques, our algorithm is able to defer the combinatorial explosion and to generate an optimal schedule not only for small DAGs but also for medium--sized ones with up to 50 nodes, a class that contains nearly all DAGs encountered in typical application programs. Experiments with randomly generated DAGs and large DAGs from real application programs confirm that the new algorithm generates optimal schedules quite fast. We extend our algorithm to cope with delay slots and multiple functional units, two common features of modern superscalar processors. Key words:...
Effective Instruction Scheduling with Limited Registers
, 2001
"... Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instruction-level parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instruction-level parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a good compiler for reducing memory references. While instruction scheduling and register allocation are both essential compiler optimizations for fully exploiting the capability of modern high-performance microprocessors, there is a phase-ordering problem when we perform these two optimizations separately: instruction scheduling before register allocation may create insatiable demands for registers; register allocation before instruction scheduling may reduce the amount of parallelism that instruction scheduling can exploit. In this thesis, we propose to solve this phase-ordering problem by inserting a moderating optimization called code reorganization between prepass instruction scheduling and register allocation. Code reorganization adjusts the prepass scheduling results to make them demand fewer registers (i.e. exhibit lower register pressure) and guides register allocation to insert spill code that has less impact on schedule length. Our new approach avoids the complexity of simultaneous instruction scheduling and register allocation algorithms. In fact, it does not modify either instruction scheduling or register allocation algorithms. Therefore instruction scheduling can focus on maximizing instruction-level parallelism, and register allocation can focus on minimizing the cost of spill code. We compare the performance of our approach with a particular successful register-pressure-sensitive scheduling algorithm, and show an average of 18% improvement in speedup for an 8...
The Design of the YAP Compiler: An Optimizing Compiler for Logic Programming Languages
"... Abstract: Several techniques for implementing Prolog in a efficient manner have been devised since the original interpreter, many of them aimed at achieving more speed. There are two main approaches to efficient Prolog implementation: (1) compilers to bytecode and then interpreting it (emulators) or ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: Several techniques for implementing Prolog in a efficient manner have been devised since the original interpreter, many of them aimed at achieving more speed. There are two main approaches to efficient Prolog implementation: (1) compilers to bytecode and then interpreting it (emulators) or (2) compilers to native code. Emulators have smaller load/compilation time and are a good solution for their simplicity when speed is not a priority. Compilers are more complex than emulators, and the difference is much more acute if some form of code analysis is performed as part of the compilation, which impacts development time. Generation of low level code promises faster programs at the expense of using more resources during the compilation phase. In our work besides using an mixed execution mode, we design an optimizing compiler that using type feedback profiling, dynamic compilation and dynamic deoptimization for improving the performance of logic programming languages.
Generating Optimal Contiguous Evaluations for Expression DAGs
"... We consider the NP-complete problem of generating contiguous evaluations for expression DAGs with a minimal number of registers. We present two algorithms that generate optimal contiguous evaluation for a given DAG. The first is a modification of a complete search algorithm that omits the generation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider the NP-complete problem of generating contiguous evaluations for expression DAGs with a minimal number of registers. We present two algorithms that generate optimal contiguous evaluation for a given DAG. The first is a modification of a complete search algorithm that omits the generation of redundant evaluations. The second algorithm generates only the most promising evaluations by splitting the DAG into trees with import and export nodes and evaluating the trees with a modified labeling scheme. Experiments with randomly generated DAGs and large DAGs from real application programs confirm that the new algorithms generate optimal contiguous evaluations quite fast.
URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures
- In Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
, 1992
"... The division of instruction scheduling and register allocation and assignment into separate phases can adversely affect the performance of these tasks and thus the quality of the code generated for load/store fine grained parallel architectures. Improved performance in one phase can deteriorate the ..."
Abstract
- Add to MetaCart
The division of instruction scheduling and register allocation and assignment into separate phases can adversely affect the performance of these tasks and thus the quality of the code generated for load/store fine grained parallel architectures. Improved performance in one phase can deteriorate the performance of the other phase, possibly resulting in poorer overall performance. In this paper we present an approach that partitions instruction scheduling and register allocation and assignment into a new set of phases in an attempt to construct phases with minimal interaction. This approach uses a technique that unifies the problems of allocating registers and functional units. The technique, which consists of three phases, operates on a dependence DAG representation of the program. The first phase carries out the measurement of resource requirements and identifies regions with excess requirements. The second phase applies transformations that reduce the requirements to levels supported ...
Genetic Instruction Scheduling and Register Allocation
"... The construction of efficient compilers is very complex, since it has to contend with various optimization problems and depends on the characteristics of the architecture of the machine for which they generate code. Many of these problems are NP-hard. The genetics algorithms have been shown to be ef ..."
Abstract
- Add to MetaCart
The construction of efficient compilers is very complex, since it has to contend with various optimization problems and depends on the characteristics of the architecture of the machine for which they generate code. Many of these problems are NP-hard. The genetics algorithms have been shown to be effective in the resolution of difficult problems, however, their use in compilation is practically nonexistent. In this paper we propose a solution to the problems of register allocation and instruction scheduling. We carry out an analysis of performance by comparing with the more traditional approaches for these problems and we obtain profits on the speed of the generated code varying between-2 % and 26%. Keywords:: optimizing compiler, genetics algorithms, instruction scheduling, register allocation.

