Results 11 - 20
of
23
An Evaluation of the Potential Benefits of Register Allocation for Array References
- In Workshop on Interaction between Compilers and Computer Architectures in conjuction with the HPCA-2
, 1996
"... Array references are common in scientific and engineering programs. However, current compilers for scalar processors do not keep array elements in registers for reuse. This is mainly because it is unclear whether the potential benefits would justify the more difficult array data flow analysis and al ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Array references are common in scientific and engineering programs. However, current compilers for scalar processors do not keep array elements in registers for reuse. This is mainly because it is unclear whether the potential benefits would justify the more difficult array data flow analysis and also because processor architectures have yet to provide efficient mechanisms for addressing array elements in registers. The goal of this paper is to evaluate the potential benefits of register allocation for array elements in scalar processors. Using trace-driven simulations of a set of benchmarking programs, which include nine from the Perfect Club Suite, four from SPEC92, and four from an image processing benchmark suite, we study the potential benefit. We relate the register reference percentage to the data reference characteristics of different programs and examine the effect of varying the number of registers from 16 to 512. We also discuss the effect of code reordering and compiler ana...
Interprocedural Register Allocation for Lazy Functional Languages
- In Proceedings of the 1995 Conference on Functional Programming Languages and Computer Architecture
, 1995
"... The aim of this paper is two-fold; first, we develop an interprocedural register allocation algorithm, an extended variant of Briggs' optimistic graph colouring. We use interprocedural coalescing as an elegant way to achieve a custom-made calling convention for each function. We add a restricted, an ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The aim of this paper is two-fold; first, we develop an interprocedural register allocation algorithm, an extended variant of Briggs' optimistic graph colouring. We use interprocedural coalescing as an elegant way to achieve a custom-made calling convention for each function. We add a restricted, and cheap, form of live range splitting in a way that is particularly useful for call intensive languages. Second, we apply our interprocedural register allocation algorithm to code generated from a lazy functional language. In doing this we use a monadic intermediate code, well suited for analysis and program transformation. We use program transformation techniques and the result of an abstract interpretation analysis on the intermediate code to eliminate all unknown control flow in the program. This will improve the chances of the register allocator to produce good results. Preliminary measurements show that we are able to eliminate up to 95% of all stack references compared to the Chalmers ...
Catching and identifying bugs in register allocation
- In Static Analysis, 13th Int. Symp., SAS 2006
, 2006
"... Abstract. Although there are many register allocation algorithms that work well, it can be difficult to correctly implement these algorithms. As a result, it is common for bugs to remain in the register allocator, even after the compiler is released. The register allocator may run, but bugs can caus ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. Although there are many register allocation algorithms that work well, it can be difficult to correctly implement these algorithms. As a result, it is common for bugs to remain in the register allocator, even after the compiler is released. The register allocator may run, but bugs can cause it to produce incorrect output code. The output program may even execute properly on some test data, but errors can remain. In this paper, we propose novel data flow analyses to statically check that the output code from the register allocator is correct in terms of its data dependences. The approach is accurate, fast, and can identify and report error locations and types. No false alarms are produced. The paper describes our approach, called SARAC, and a tool, called ra-analyzer, that statically checks a register allocation and reports the errors it finds. The tool has an average compile-time overhead of only 8 % and a modest average memory overhead of 85KB. 1
Zero-cost Range Splitting
, 1994
"... This paper presents a new optimization technique that uses empty delay slots to improve code scheduling. We are able to split live ranges for free, by inserting spill code into empty delay slots. Splitting a live range can reduce interferences with other live ranges and can sometimes free registers. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper presents a new optimization technique that uses empty delay slots to improve code scheduling. We are able to split live ranges for free, by inserting spill code into empty delay slots. Splitting a live range can reduce interferences with other live ranges and can sometimes free registers. Live ranges no longer interfering with the split live range can sometimes make use of the extra register. Our algorithm, as a final pass over the code, exploits empty delay slots that would remain unused if spill code was not inserted. This paper proposes a variety of optimizations that use the extra registers generated from live range splitting, including coalescing live ranges and improving code scheduling. We present an algorithm for improving code scheduling and present implementation results. 1 Introduction Compiler writers use heuristics for register allocation and instruction scheduling, as both are NP-Complete [Set75] [HG82] [PS90]. Instruction scheduling tries to minimize the numb...
Profile-Based Optimization with Statistical Profiles
, 1997
"... An important barrier to the use of profile-based optimization is the generation of high-quality profile data. In this paper we describe our experience with a prototype system for continuous and automatic generation of profile data. Our system collects profile data continuously for all computation on ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
An important barrier to the use of profile-based optimization is the generation of high-quality profile data. In this paper we describe our experience with a prototype system for continuous and automatic generation of profile data. Our system collects profile data continuously for all computation on a system by sampling the program counter once per hardware clock interrupt. We have found that the overhead of our prototype monitor is negligible, making it practical to collect profile information continuously for all activity on the system. We used several qualitative and quantitative methods to explore how the quality of these sampled profiles improves over time, and demonstrate that profiles of adequate quality for optimization can be obtained after a small number of monitored runs. For a selection of instruction-cache intensive benchmarks, statistical monitoring provides profiles of a quality comparable to that of complete profiles obtained with program instrumentation while avoiding ...
Link-time Improvement of Scheme Programs
- Proc. 8th. International Conference on Compiler Construction (CC'99
, 1999
"... . Optimizing compilers typically limit the scope of their analyses and optimizations to individual modules. This has two drawbacks: first, library code cannot be optimized together with their callers, which implies that reusing code through libraries incurs a penalty; and second, the results of anal ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
. Optimizing compilers typically limit the scope of their analyses and optimizations to individual modules. This has two drawbacks: first, library code cannot be optimized together with their callers, which implies that reusing code through libraries incurs a penalty; and second, the results of analysis and optimization cannot be propagated from an application module written in one language to a module written in another. A possible solution is to carry out (additional) program optimization at link time. This paper describes our experiences with such optimization using two different optimizing Scheme compilers, and several benchmark programs, via alto, a link-time optimizer we have developed for the DEC Alpha architecture. Experiments indicate that significant performance improvements are possible via link-time optimization even when the input programs have already been subjected to high levels of compile-time optimization. 1 Introduction The traditional model of compilation usually l...
Binary translation to improve energy efficiency through post-pass register reallocation
- in Proceedings of the fourth ACM international conference on Embedded software
, 2004
"... Energy efficiency is rapidly becoming a first class optimization parameter for modern systems. Caches are critical to the overall performance and thus, modern processors (both high and low-end) tend to deploy a cache with large size and high degree of associativity. Due a large size cache power take ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Energy efficiency is rapidly becoming a first class optimization parameter for modern systems. Caches are critical to the overall performance and thus, modern processors (both high and low-end) tend to deploy a cache with large size and high degree of associativity. Due a large size cache power takes up a significant percentage of total system power. One important way to reduce cache power consumption is to reduce the dynamic activities in the cache by reducing the dynamic load-store counts. In this work, we focus on programs that are only available as binaries which need to be improved for energy efficiency. For adapting these programs for energy-constrained devices, we propose a feed-back directed post-pass solution that tries to do register re-allocation to reduce dynamic load/store counts and to improve energyefficiency. Our approach is based on zero knowledge of original code generator or compiler and performs a post-pass register allocation to get a more power-efficient binary. We attempt to find out the dead as well as unused registers in the binary and then reallocate them on hot paths to reduce dynamic load/store counts. It is shown that the static code size increase due to our framework is very minimal. Our experiments on SPEC2000 and MediaBench show that our technique is effective. We have seen dynamic spill loads/stores reduction in the data-cache ranging from 0 % to 26.4%. Overall, our approach improves the energy-delay product of the program.
: A Link-Time Optimizer for the DEC Alpha
, 1998
"... Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other o ..."
Abstract
- Add to MetaCart
Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before linktime such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole-program optimization at link time. This paper describes alto, a link-time optimizer for the DEC Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster that that obtained using the OM link-time optimizer, even when the latter is used in conjunction with profile-guided and inter-file compile-...
Issues in Register Allocation by Graph Coloring
, 1996
"... This technical report addresses some issues in register allocation by graph coloring and presents three improvements, storage-class analysis, priority-based simplification and preference decision. The influence of the three improvements to graph coloring is discussed in this report. Comparisons of v ..."
Abstract
- Add to MetaCart
This technical report addresses some issues in register allocation by graph coloring and presents three improvements, storage-class analysis, priority-based simplification and preference decision. The influence of the three improvements to graph coloring is discussed in this report. Comparisons of various register allocations are discussed as well. This research was sponsored in part by the Advanced Research Projects Agency/ITO monitored by SPAWAR under contract N00039-93-C-0152. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of ARPA, SPAWAR, or the U.S. Government. Keywords: Register allocation, graph coloring, compiler optimization, code generation 1 Introduction Graph coloring is a common approach to model register allocation. Register allocator assigns registers (colors) to live ranges in such a manner that conflicting live ranges --- i.e., live rang...
Back End Issues for Modern Microprocessors: The State of the Art
, 1997
"... This paper discusses some issues in the construction of compiler back ends targeted towards modern microprocessors. Our focus is on those aspects of optimization which deal directly with adapting the compiled code to the specific characteristics of these processors, specifically memory hierarchies a ..."
Abstract
- Add to MetaCart
This paper discusses some issues in the construction of compiler back ends targeted towards modern microprocessors. Our focus is on those aspects of optimization which deal directly with adapting the compiled code to the specific characteristics of these processors, specifically memory hierarchies and instruction level parallelism, and which are applicable to a wide range of programs.

