Results 1  10
of
36
InstructionLevel Parallel Processing: History, Overview and Perspective
, 1992
"... Instructionlevel Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract

Cited by 171 (0 self)
 Add to MetaCart
Instructionlevel Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract

Cited by 136 (4 self)
 Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
Instruction Selection Using Binate Covering for Code Size Optimization
 Int. Conf. on ComputerAided Design (ICCAD
, 1995
"... We address the problem of instruction selection in code generation for embedded DSP microprocessors. Such processors have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. Instruction selection can be formulated as directed acyclic graph (DAG ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
We address the problem of instruction selection in code generation for embedded DSP microprocessors. Such processors have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. Instruction selection can be formulated as directed acyclic graph (DAG) covering. Conventional methods for instruction selection use heuristics that break up the DAG into a forest of trees and then cover them independently. This breakup can result in suboptimal solutions for the original DAG. Alternatively, the DAG covering problem can be formulated as a binate covering problem, and solved exactly or heuristically using branchandbound methods. We show that optimal instruction selection on a DAG in the case of accumulatorbased architectures requires a partial scheduling of nodes in the DAG, and we augment the binate covering formulation to minimize spills and reloads. We show how the irregular data transfer costs of typical DSP datapaths can be modeled in th...
Fast Module Mapping and Placement for Datapaths in FPGAs
 In ACM/SIGDA International Symposium on Field Programmable Gate Arrays
, 1998
"... By tailoring a compiler treeparsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure; this allows exploitation of specialized datapath features in FPGAs, ret ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
By tailoring a compiler treeparsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure; this allows exploitation of specialized datapath features in FPGAs, retains regularity, and also results in a smaller problem size. To further achieve high mapping speed, we formulate the problem as tree covering and solve it efficiently with a lineartime dynamic programming algorithm. In a novel extension to the treecovering algorithm, we perform module placement simultaneously with the mapping, still in linear time. Integrating placement has the potential to increase the quality of the result since we can optimize total delay including routing delays. To our knowledge this is the first effort to leverage a grammarbased tree covering tool for datapath module mapping. Further, it is the first work to integrate simultaneous placement with module mapping in a w...
On Optimizing A Class Of MultiDimensional Loops With Reductions For Parallel Execution
 Parallel Processing Letters
, 1997
"... This paper addresses the compiletime optimization of a form of nestedloop computation that is motivated by a computational physics application. The computations involve multidimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of ..."
Abstract

Cited by 28 (22 self)
 Add to MetaCart
This paper addresses the compiletime optimization of a form of nestedloop computation that is motivated by a computational physics application. The computations involve multidimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NPcompleteness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomialtime algorithm for determination of optimal distribution of the arrays are also provided. Keywords: loop parallelization, operation minimization, communication op...
Lazy Strength Reduction
 Journal of Programming Languages
"... We present a bitvector algorithm that uniformly combines code motion and strength reduction, avoids superfluous register pressure due to unnecessary code motion, and is as efficient as standard unidirectional analyses. The point of this algorithm is to combine the concept of lazy code motion of [1] ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
We present a bitvector algorithm that uniformly combines code motion and strength reduction, avoids superfluous register pressure due to unnecessary code motion, and is as efficient as standard unidirectional analyses. The point of this algorithm is to combine the concept of lazy code motion of [1] with the concept of unifying code motion and strength reduction of [2, 3, 4, 5]. This results in an algorithm for lazy strength reduction, which consists of a sequence of unidirectional analyses, and is unique in its transformational power. Keywords: Data flow analysis, program optimization, partial redundancy elimination, code motion, strength reduction, bitvector data flow analyses. 1 Motivation Code motion improves the runtime efficiency of a program by avoiding unnecessary recomputations of a value at runtime. Strength reduction improves runtime efficiency by reducing "expensive" recomputations to less expensive ones, e.g., by reducing computations involving multiplication to computat...
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the handdesigned VLSIBAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
Probabilistic Arithmetic
, 1989
"... This thesis develops the idea of probabilistic arithmetic. The aim is to replace arithmetic operations on numbers with arithmetic operations on random variables. Specifically, we are interested in numerical methods of calculating convolutions of probability distributions. The longterm goal is to ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
This thesis develops the idea of probabilistic arithmetic. The aim is to replace arithmetic operations on numbers with arithmetic operations on random variables. Specifically, we are interested in numerical methods of calculating convolutions of probability distributions. The longterm goal is to be able to handle random problems (such as the determination of the distribution of the roots of random algebraic equations) using algorithms which have been developed for the deterministic case. To this end, in this thesis we survey a number of previously proposed methods for calculating convolutions and representing probability distributions and examine their defects. We develop some new results for some of these methods (the Laguerre transform and the histogram method), but ultimately find them unsuitable. We find that the details on how the ordinary convolution equations are calculated are
Using Program Checking to Ensure the Correctness of Compiler Implementations
 Journal of Universal Computer Science (J.UCS
, 2003
"... Abstract: We evaluate the use of program checking to ensure the correctness of compiler implementations. Our contributions in this paper are threefold: Firstly, we extend the classical notion of blackbox program checking to program checking with certificates. Our checking approach with certificates ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract: We evaluate the use of program checking to ensure the correctness of compiler implementations. Our contributions in this paper are threefold: Firstly, we extend the classical notion of blackbox program checking to program checking with certificates. Our checking approach with certificates relies on the observation that the correctness of solutions of NPcomplete problems can be checked in polynomial time whereas their computation itself is believed to be much harder. Our second contribution is the application of program checking with certificates to optimizing compiler backends, in particular code generators, thus answering the open question of how program checking for such compiler backends can be achieved. In particular, we state a checking algorithm for code generation based on bottomup rewrite systems from static single assignment representations. We have implemented this algorithm in a checker for a code generator used in an industrial project. Our last contribution in this paper is an integrated view on all compiler passes, in particular a comparison between frontend and backend phases, with respect to the applicable methods of program checking.