Results 1 - 10
of
26
Compiler-Controlled Memory
- In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems
, 1998
"... Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler’s ability to improve memory behavior is its im-perfect knowledge about the ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler’s ability to improve memory behavior is its im-perfect knowledge about the run-time behavior of the program. The compiler cannot completely predict run-time access patterns. There is an exception to this rule. During the reg-ister allocation phase, the compiler often must insert substantial amount,s of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program. Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs-a small compiler-con-trolled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierar-chy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traf-fic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such mem-ories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips. This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation’s coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program. Permtsslon to make dIgItal or hard copies of all or part of this work for personal or classroom “se IS granted wthout fee prowded that copnes are not made or distributed for profn or commercial advan-tage and that copws bear thts notice and the full c!tatmn on the Hurst page.
Optimistic Register Coalescing
- In Proceedings of the 1998 International Conference on Parallel Architecture and Compilation Techniques
, 1998
"... Graph-coloring register allocators eliminate copies by coalescing the source and target node of a copy if they do not interfere in the interference graph. Coalescing is, however, known to be harmful to the colorability of the graph because it tends to yield a graph with nodes of higher degrees. Unli ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
Graph-coloring register allocators eliminate copies by coalescing the source and target node of a copy if they do not interfere in the interference graph. Coalescing is, however, known to be harmful to the colorability of the graph because it tends to yield a graph with nodes of higher degrees. Unlike aggressive coalescing which coalesces any pair of non-interfering copyrelated nodes, conservative coalescing or iterated coalescing perform safe coalescing that preserves the colorability. Unfortunately, these heuristics give up coalescing too early, losing many opportunities of coalescing that would turn out to be safe. Moreover, they ignore the fact that coalescing may even improve the colorability of the graph by reducing the degree of neighbor nodes that are interfering with both the source and target nodes being coalesced. This paper proposes a new heuristic called optimistic coalescing which optimistically performs aggressive coalescing, thus fully exploiting the positive impact of ...
Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor
- In Proc. 26th Ann. Int’l Symp. on Computer Architecture
, 1998
"... Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called the data-decoupled architecture. The approach, with support from the compiler and/or hardware, partitions the memory stream into two independent streams early in the processor pipeline, and feeds each stream to a separate memory access queue and cache. Under this model, the paper studies the potential of decoupling memory accesses to program's local variables that are allocated on the run-time stack. Using a set of integer and floating-point programs from the SPEC95 benchmark suite, it is shown that local variable accesses constitute a large portion of all the memory references, while their reference space is ...
Live Range Splitting in a Graph Coloring Register Allocator
, 1998
"... Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they a ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they are simultaneously live and, thus, cannot share a register.
Register allocation for programs in ssa-form
- In Compiler Construction 2006, volume 3923 of LNCS
, 2006
"... In this technical report, we present an architecture for register allocation on the SSA-form. We show, how the properties of SSA-form programs and their interference graphs can be exploited to develop new methods for spilling, coloring and coalescing. We present heuristic and optimal solution method ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
In this technical report, we present an architecture for register allocation on the SSA-form. We show, how the properties of SSA-form programs and their interference graphs can be exploited to develop new methods for spilling, coloring and coalescing. We present heuristic and optimal solution methods for these three subtasks. 1
The Power of Belady's Algorithm in Register Allocation for Long Basic Blocks
- The 16th International Workshop on Languages and Compilers for Parallel Computing
, 2003
"... Optimization techniques such as loop-unrolling and trace-scheduling can result in long straight-line codes. It is, however, unclear how well the register allocation algorithms of current compilers perform on these codes. Compilers may well have been optimized for human written codes, which are li ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Optimization techniques such as loop-unrolling and trace-scheduling can result in long straight-line codes. It is, however, unclear how well the register allocation algorithms of current compilers perform on these codes. Compilers may well have been optimized for human written codes, which are likely to have short basic blocks. To evaluate how the state of the art compilers behave on long straight-line codes, we wrote a compiler that implements the simple Belady's MIN algorithm.
A Progressive Register Allocator for Irregular Architectures
"... ... a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on irregular, CISC-like, architectures with few registers and nonorthogonal instruction sets. At the other extreme, optimal register allocators ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
... a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on irregular, CISC-like, architectures with few registers and nonorthogonal instruction sets. At the other extreme, optimal register allocators based on integer linear programming are capable of fully modeling and exploiting the peculiarities of irregular architectures but do not scale well. We introduce the idea of a progressive allocator. A progressive allocator finds an initial allocation of quality comparable to a conventional allocator, but as more time is allowed for computation the quality of the allocation approaches optimal. This paper presents a progressive register allocator which uses a multi-commodity network flow model to elegantly represent the intricacies of irregular architectures. We evaluate our allocator as a substitute for gcc's local register allocation pass.
Register Saturation in Superscalar and VLIW Codes
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, LECTURE NOTES IN COMPUTER SCIENCE
, 2001
"... The registers constraints can be taken into account during the scheduling phase of an acyclic data dependence graph (DAG): any schedule must minimize the register requirement. In this work, we mathematically study and extend the approach which consists of computing the exact upper-bound of the r ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The registers constraints can be taken into account during the scheduling phase of an acyclic data dependence graph (DAG): any schedule must minimize the register requirement. In this work, we mathematically study and extend the approach which consists of computing the exact upper-bound of the register need for all the valid schedules, independently of the functional unit constraints. A previous work (URSA) was presented in [5, 4]. Its aim was to add some serial arcs to the original DAG such that the worst register need does not exceed the number of available registers. We write an appropriate mathematical formalism for this problem and extend the DAG model to take into account delayed read from and write into registers with multiple registers types. This
Hardware-managed register allocation for embedded processors
- in LCTES ’04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and
, 2004
"... Most modern processors (either embedded or general purpose) contain higher number of physical registers than those exposed in the ISA. Due to a variety of reasons, this phenomenon is likely to continue especially on embedded systems where encoding space is very limited. Saving the encoding space lea ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Most modern processors (either embedded or general purpose) contain higher number of physical registers than those exposed in the ISA. Due to a variety of reasons, this phenomenon is likely to continue especially on embedded systems where encoding space is very limited. Saving the encoding space leads to lower power consumption in the I-cache; on the other hand, harnessing more physical registers saves power in the memory subsystem and reduces latency as well. These design decisions however result in the difficulty of register allocation for a compiler due to limited number of exposed registers at ISA level. In this paper, we therefore propose a hardware managed register allocation scheme to allocate more physical registers at runtime and to utilize them. As a byproduct, we also show that hardware managed register allocation has other merits such as better exploitation of low register pressure regions, more flexible management of caller-save/callee-save registers, etc. Our approach consists of both compiler and hardware enhancements. On the compiler side we assign variables at various stack offsets such that the offsets indicate the relative allocation priorities at runtime. The hardware is modified to identify such spills and make decisions whether they should be put in the (invisible) physical registers based on their relative priorities. Finally, our results show slight improvement in the instructions per cycle counts (IPC) but significant power consumption reduction in the cache.
Revisiting graph coloring register allocation: A study of the Chaitin-Briggs and CallahanKoblenz algorithms
- In Proc. of the Workshop on Languages and Compilers for Parallel Computing (LCPC’05
, 2005
"... Abstract. Techniques for global register allocation via graph coloring have been extensively studied and widely implemented in compiler frameworks. This paper examines a particular variant – the Callahan Koblenz allocator – and compares it to the Chaitin-Briggs graph coloring register allocator. Bot ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Techniques for global register allocation via graph coloring have been extensively studied and widely implemented in compiler frameworks. This paper examines a particular variant – the Callahan Koblenz allocator – and compares it to the Chaitin-Briggs graph coloring register allocator. Both algorithms were published in the 1990’s, yet the academic literature does not contain an assessment of the Callahan-Koblenz allocator. This paper evaluates and contrasts the allocation decisions made by both algorithms. In particular, we focus on two key differences between the allocators: Spill code: The Callahan-Koblenz allocator attempts to minimize the effect of spill code by using program structure to guide allocation and spill code placement. We evaluate the impact of this strategy on allocated code. Copy elimination: Effective register-to-register copy removal is important for producing good code. The allocators use different techniques to eliminate these copies. We compare the mechanisms and provide insights into the relative performance of the contrasting techniques. The Callahan-Koblenz allocator may potentially insert extra branches as part of the allocation process. We also measure the performance overhead due to these branches. 1

