Results 11 -
17 of
17
Target-specific Global Code Improvement: Principles and Applications
, 1994
"... This article describes the key principles behind the design and implementation of a global code improver that has been use to construct several high-quality compilers and other program transformation and analysis tools. The code improver, called vpo, employs a paradigm of compilation that has proven ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This article describes the key principles behind the design and implementation of a global code improver that has been use to construct several high-quality compilers and other program transformation and analysis tools. The code improver, called vpo, employs a paradigm of compilation that has proven to be flexible and adaptable---all code improving transformations are performed on a target-specific representation of the program. The aggressive use of this paradigm yields a code improver with several valuable properties. Four properties stand out. First, vpo is language and compiler independent. That is, it has been used to implement compilers for several different computer languages. For the C programming language, it has been used with several front ends each of which generates a different intermediate language. Second, because all code improvements are applied to a single low-level intermediate representation, phase ordering programs are minimized. Third, vpo is easily retargeted and handles a wide variety of architectures. In particular, vpo's structure allows new architectures and new implementations of existing architectures to be accommodated quickly and easily. Fourth and finally, because of its flexible structure, vpo has several other interesting uses in addition to its primary use in an optimizing compiler. This article describes the principles that have driven the design of vpo and the implications of these principles on vpo's implementation. The article concludes with a brief description of vpo's use as a back end with front ends for several different languages, and its use as a key component
Handling Irreducible Loops: Optimized Node Splitting vs. DJ-Graphs
- Lecture Notes in Computer Science
, 2001
"... This paper addresses the question of how to handle irreducible regions during optimization, which has become even more relevant for contemporary processors since recent VLIW-like architectures highly rely on instruction scheduling. The contributions of this paper are twofold. First, a method of opti ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper addresses the question of how to handle irreducible regions during optimization, which has become even more relevant for contemporary processors since recent VLIW-like architectures highly rely on instruction scheduling. The contributions of this paper are twofold. First, a method of optimized node splitting to transform irreducible regions of control ow into reducible regions is derived. This method is superior to approaches previously published since it reduces the number of replicated nodes by comparison. Second, three methods that handle regions of irreducible control ow are evaluated with respect to their impact on compiler optimizations: traditional and optimized node splitting as well as loop analysis through DJ graphs. Measurements show improvements of 1-40% for these methods of handling irreducible loop over the unoptimized case.
Techniques for Fast Instruction Cache Performance Evaluation
, 1993
"... This paper evaluates techniques that attempt to overcome these problems for instruction cache performance evaluation. For each technique variations with and without periodic context switches areexamined. Information calculated during the compilation is used to reduce the number of references in the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper evaluates techniques that attempt to overcome these problems for instruction cache performance evaluation. For each technique variations with and without periodic context switches areexamined. Information calculated during the compilation is used to reduce the number of references in the trace. Thus, in effect references arestripped beforethe initial trace is generated. These techniques areshown to significantly reduce the time required for evaluating instruction caches with no loss of accuracy.
Relating Static and Dynamic Machine Code Measurements
- IEEE Transactions on Computers
, 1992
"... In an effort to relate static measurements of machine code instructions and addressing modes to their dynamic counterparts, both types of measurements were made on nine different machines using a large and varied suite of programs. Using classical regression analysis techniques, the relationship bet ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In an effort to relate static measurements of machine code instructions and addressing modes to their dynamic counterparts, both types of measurements were made on nine different machines using a large and varied suite of programs. Using classical regression analysis techniques, the relationship between static architecture measurements and dynamic architecture measurements was explored. The statistical analysis showed that many static and dynamic measurements are strongly correlated and that it is possible to use the more easily obtained static measurements to predict dynamic usage of instructions and addressing modes. With few exceptions, the predictions are accurate for most architectural features. Index Terms---Instruction sets, performance evaluation, computer architecture, dynamic measurements, static measurements, machine design. I. Introduction Static measurements of program code at machine level are generally thought to be useful for determining textual space needs while dynam...
CSDL: Reusable Computing System Descriptions for Retargetable System Software
, 2000
"... In an era of rapid design of microprocessors for desktop systems, embedded systems, and handheld computing devices, the timely construction of systems software is essential. Systems software, such as assemblers, compilers, and debuggers, must be constructed before development of application software ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In an era of rapid design of microprocessors for desktop systems, embedded systems, and handheld computing devices, the timely construction of systems software is essential. Systems software, such as assemblers, compilers, and debuggers, must be constructed before development of application software for a microprocessor can commence. However, the implementation of such machine-specific applications is difficult and time consuming. Therefore, to remain competitive, it is imperative that systems software designs focus on portability to reduce implementation time and ensure rapid delivery of complete systems to the market. This dissertation presents the Computing System Description Language (CSDL) framework that addresses these rapid development requirements. We illustrate the CSDL framework by developing an instruction-set description component (RTL), an optional procedure calling convention description component (CCL), and the mechanism we use to extend extant descriptions (CSDL). RTL and its accompanying microinstruction descriptions (RTL) further the state-of-the-art in specifying semantics of machine instructions. RTL adds a new type system and abstract syntax that facilitates more accurate specification and automatic detection of errors by RTL manipulators. RTL machine descriptions are also application independent---they completely separate the specification of semantics from the application's implementation. The CCL specification language is the first work to formally describe procedure calling conventions. We demonstrate two distinct uses for CCL descriptions: code generation and fault detection. Using CCL we have built compilers that are more robust, and found and diagnosed faults in production compilers. CCL, RTL, and RTL descriptions are bound together u...
Using a swap instruction to coalesce loads and stores
- In Proceedings of the European Conference on Parallel Computing
, 2001
"... Abstract. A swap instruction, which exchanges a value in memory with a value of a register, is available on many architectures. The primary application of a swap instruction has been for process synchronization. As an experiment we wished to see how often a swap instruction can be used to coalesce l ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. A swap instruction, which exchanges a value in memory with a value of a register, is available on many architectures. The primary application of a swap instruction has been for process synchronization. As an experiment we wished to see how often a swap instruction can be used to coalesce loads and stores to improve the performance of a variety of applications. The results show that both the number of accesses to the memory system (data cache) and the number of executed instructions are reduced. 1 INTRODUCTION An instruction that exchanges a value in memory with a value in a register has been used on a variety of machines. The primary purpose for these swap instructions is to provide an atomic operation for reading from and writing to memory, which has been used to construct mutual-exclusion mechanisms in software for process synchronization. In fact, there are other forms of hardware instructions that have been used to support mutual exclusion, which include the classic test-and-set instruction. We thought it would be interesting to see if a swap instruction could be exploited in a more conventional manner. In this paper we show that a swap instruction can also be used by a low-level codeimproving transformation to coalesce loads and stores into a single instruction, which results in a reduction of memory references and executed instructions.
Memory Bandwidth Optimization for Wide-Bus Machines
, 1993
"... One of the critical problems facing designers of high performance processors is the disparity between processor speed and memory speed. This has occurred because innovation and technological improvements in processor design have outpaced advances in memory design. While not a panacea, some gains in ..."
Abstract
- Add to MetaCart
One of the critical problems facing designers of high performance processors is the disparity between processor speed and memory speed. This has occurred because innovation and technological improvements in processor design have outpaced advances in memory design. While not a panacea, some gains in memory performance can be had by simply increasing the width of the bus from the processor to memory. Indeed, high performance microprocessors with wide buses (i.e., capable of transferring 64 bits or more between the CPU and memory) are beginning to become commercially available (e.g. MIPS R4000, DEC Alpha, and Motorola 88110). This paper discusses some compiler optimizations that take advantage of the increased bandwidth available from a wide bus. We have found that very simple strategies can reduce the number of memory requests by 10 to 15 percent. For some data and compute intensive algorithms, more aggressive optimizations can yield significantly higher reductions. The paper describes t...

