Results 1  10
of
12
Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers
 In PACT `94: International Conference on Parallel Architectures and Compilation Techniques
, 1994
"... We present Resource Spackling, a framework for integrating register allocation and instruction scheduling that is based on a Measure and Reduce paradigm. The technique first measures the resource requirements of a program and then uses these measurements to distribute code for better resource alloca ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
We present Resource Spackling, a framework for integrating register allocation and instruction scheduling that is based on a Measure and Reduce paradigm. The technique first measures the resource requirements of a program and then uses these measurements to distribute code for better resource allocation. The technique is general in that it is applicable to both local and global scheduling and the allocation of different types of resources. A program's resource requirements for both register and functional unit resources are first measured using a unified representation. These measurements are used to find areas where resources are either underutilized or overutilized, called resource holes and excessive sets, respectively. Conditions are determined for increasing resource utilization in the resource holes. A local scheduler that moves sets of instructions into resource holes to reduce the excessive sets for all resources is presented. We develop a global scheduling algorithm that mo...
Using Integer Linear Programming for Instruction Scheduling and Register Allocation in Multiissue Processors
 in MultiIssue Processors. Computers and Mathematics with Applications
, 1997
"... Instruction scheduling and register allocation are two very important optimizations in modern compilers for advanced processors. These two optimizations must be performed simultaneously in order to maximize the instructionlevel parallelism and to fully utilize the registers [11]. In this paper w ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Instruction scheduling and register allocation are two very important optimizations in modern compilers for advanced processors. These two optimizations must be performed simultaneously in order to maximize the instructionlevel parallelism and to fully utilize the registers [11]. In this paper we solve register allocation and instruction scheduling simultaneously using integer linear programming (ILP). We have successfully worked out the ILP formulations for the problem with and without register spilling. Two kinds of optimizations are considered: (1) Fix the number of free registers and then solve for the minimum number of cycles to execute the instructions, or (2) fix the maximum execution cycles for the instructions and solve for the minimum number of registers needed. Besides being theoretically interesting, our solution serves as a reference point for other heuristic solutions. The formulations are also applicable to highlevel synthesis of ASICs and designs for embedde...
PhaseCoupled Mapping of Data Flow Graphs to Irregular Data Paths
 Embedded Systems
, 1999
"... . Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
. Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable the use of highlevel language compilers also for embedded software, new code generation and optimization techniques are required. This paper describes a novel code generation technique for embedded processors with irregular data path architectures, such as typically found in fixedpoint DSPs.The proposed code generation technique maps data flow graph representation of a program into highly efficient machine code for a target processor modeled by instruction set behavior. High code quality is ensured by tight coupling of different code generation phases. In contrast to earlier works, mainly based on heuristics, our approach is constraintbased. An initial set of constraints on code generati...
Minimum Register Instruction Sequencing to Reduce Register Spills in OutofOrder Issue Superscalar Architectures
 IEEE Transactions on Computers
, 2003
"... Abstract — In this paper we address the problem of generating an optimal ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract — In this paper we address the problem of generating an optimal
Scheduling Expression DAGs for Minimal Register Need
, 1998
"... Generating schedules for expression DAGs that use a minimal number of registers is a classical NPcomplete optimization problem. Up to now an exact solution could only be computed for small DAGs (with up to 20 nodes), using a trivial O(n!) enumeration algorithm. We present a new algorithm with ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Generating schedules for expression DAGs that use a minimal number of registers is a classical NPcomplete optimization problem. Up to now an exact solution could only be computed for small DAGs (with up to 20 nodes), using a trivial O(n!) enumeration algorithm. We present a new algorithm with worstcase complexity O(n2 2n ) and very good average behaviour. Applying a dynamic programming scheme and reordering techniques, our algorithm is able to defer the combinatorial explosion and to generate an optimal schedule not only for small DAGs but also for mediumsized ones with up to 50 nodes, a class that contains nearly all DAGs encountered in typical application programs. Experiments with randomly generated DAGs and large DAGs from real application programs confirm that the new algorithm generates optimal schedules quite fast. We extend our algorithm to cope with delay slots and multiple functional units, two common features of modern superscalar processors. Key words:...
Effective Instruction Scheduling with Limited Registers
, 2001
"... Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instructionlevel parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instructionlevel parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a good compiler for reducing memory references. While instruction scheduling and register allocation are both essential compiler optimizations for fully exploiting the capability of modern highperformance microprocessors, there is a phaseordering problem when we perform these two optimizations separately: instruction scheduling before register allocation may create insatiable demands for registers; register allocation before instruction scheduling may reduce the amount of parallelism that instruction scheduling can exploit. In this thesis, we propose to solve this phaseordering problem by inserting a moderating optimization called code reorganization between prepass instruction scheduling and register allocation. Code reorganization adjusts the prepass scheduling results to make them demand fewer registers (i.e. exhibit lower register pressure) and guides register allocation to insert spill code that has less impact on schedule length. Our new approach avoids the complexity of simultaneous instruction scheduling and register allocation algorithms. In fact, it does not modify either instruction scheduling or register allocation algorithms. Therefore instruction scheduling can focus on maximizing instructionlevel parallelism, and register allocation can focus on minimizing the cost of spill code. We compare the performance of our approach with a particular successful registerpressuresensitive scheduling algorithm, and show an average of 18% improvement in speedup for an 8...
Genetic Instruction Scheduling and Register Allocation
"... The construction of efficient compilers is very complex, since it has to contend with various optimization problems and depends on the characteristics of the architecture of the machine for which they generate code. Many of these problems are NPhard. The genetics algorithms have been shown to be ef ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The construction of efficient compilers is very complex, since it has to contend with various optimization problems and depends on the characteristics of the architecture of the machine for which they generate code. Many of these problems are NPhard. The genetics algorithms have been shown to be effective in the resolution of difficult problems, however, their use in compilation is practically nonexistent. In this paper we propose a solution to the problems of register allocation and instruction scheduling. We carry out an analysis of performance by comparing with the more traditional approaches for these problems and we obtain profits on the speed of the generated code varying between2 % and 26%. Keywords:: optimizing compiler, genetics algorithms, instruction scheduling, register allocation.
From Machine Scheduling to VLIW Instruction Scheduling
"... ... and instruction scheduling problems on modern VLIW processors such as the STMicroelectronics ST200. Our motivations are to apply the machine scheduling techniques that are relevant to instruction scheduling in VLIW compilers, and to understand how processor microarchitecture features impact ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
... and instruction scheduling problems on modern VLIW processors such as the STMicroelectronics ST200. Our motivations are to apply the machine scheduling techniques that are relevant to instruction scheduling in VLIW compilers, and to understand how processor microarchitecture features impact advanced instruction scheduling techniques. Based on this discussion, we present our theoretical contributions to the field of instruction scheduling that are applied in the STMicroelectronics ST200 production compiler, and we introduce a new timeindexed formulation for the register constrained instruction scheduling problems.
The Design of the YAP Compiler: An Optimizing Compiler for Logic Programming Languages
"... Abstract: Several techniques for implementing Prolog in a efficient manner have been devised since the original interpreter, many of them aimed at achieving more speed. There are two main approaches to efficient Prolog implementation: (1) compilers to bytecode and then interpreting it (emulators) or ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract: Several techniques for implementing Prolog in a efficient manner have been devised since the original interpreter, many of them aimed at achieving more speed. There are two main approaches to efficient Prolog implementation: (1) compilers to bytecode and then interpreting it (emulators) or (2) compilers to native code. Emulators have smaller load/compilation time and are a good solution for their simplicity when speed is not a priority. Compilers are more complex than emulators, and the difference is much more acute if some form of code analysis is performed as part of the compilation, which impacts development time. Generation of low level code promises faster programs at the expense of using more resources during the compilation phase. In our work besides using an mixed execution mode, we design an optimizing compiler that using type feedback profiling, dynamic compilation and dynamic deoptimization for improving the performance of logic programming languages.
Generating Optimal Contiguous Evaluations for Expression DAGs
"... We consider the NPcomplete problem of generating contiguous evaluations for expression DAGs with a minimal number of registers. We present two algorithms that generate optimal contiguous evaluation for a given DAG. The first is a modification of a complete search algorithm that omits the generation ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider the NPcomplete problem of generating contiguous evaluations for expression DAGs with a minimal number of registers. We present two algorithms that generate optimal contiguous evaluation for a given DAG. The first is a modification of a complete search algorithm that omits the generation of redundant evaluations. The second algorithm generates only the most promising evaluations by splitting the DAG into trees with import and export nodes and evaluating the trees with a modified labeling scheme. Experiments with randomly generated DAGs and large DAGs from real application programs confirm that the new algorithms generate optimal contiguous evaluations quite fast.