Results 1 - 10
of
29
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
Instruction Selection Using Binate Covering for Code Size Optimization
- Int. Conf. on Computer-Aided Design (ICCAD
, 1995
"... We address the problem of instruction selection in code generation for embedded DSP microprocessors. Such processors have highly irregular data-paths, and conventional code generation methods typically result in inefficient code. Instruction selection can be formulated as directed acyclic graph (DAG ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
We address the problem of instruction selection in code generation for embedded DSP microprocessors. Such processors have highly irregular data-paths, and conventional code generation methods typically result in inefficient code. Instruction selection can be formulated as directed acyclic graph (DAG) covering. Conventional methods for instruction selection use heuristics that break up the DAG into a forest of trees and then cover them independently. This breakup can result in suboptimal solutions for the original DAG. Alternatively, the DAG covering problem can be formulated as a binate covering problem, and solved exactly or heuristically using branch-and-bound methods. We show that optimal instruction selection on a DAG in the case of accumulator-based architectures requires a partial scheduling of nodes in the DAG, and we augment the binate covering formulation to minimize spills and reloads. We show how the irregular data transfer costs of typical DSP data-paths can be modeled in th...
Fast Module Mapping and Placement for Datapaths in FPGAs
- In ACM/SIGDA International Symposium on Field Programmable Gate Arrays
, 1998
"... By tailoring a compiler tree-parsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure; this allows exploitation of specialized datapath features in FPGAs, ret ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
By tailoring a compiler tree-parsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure; this allows exploitation of specialized datapath features in FPGAs, retains regularity, and also results in a smaller problem size. To further achieve high mapping speed, we formulate the problem as tree covering and solve it efficiently with a linear-time dynamic programming algorithm. In a novel extension to the tree-covering algorithm, we perform module placement simultaneously with the mapping, still in linear time. Integrating placement has the potential to increase the quality of the result since we can optimize total delay including routing delays. To our knowledge this is the first effort to leverage a grammarbased tree covering tool for datapath module mapping. Further, it is the first work to integrate simultaneous placement with module mapping in a w...
On Optimizing A Class Of Multi-Dimensional Loops With Reductions For Parallel Execution
- Parallel Processing Letters
, 1997
"... This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of ..."
Abstract
-
Cited by 27 (21 self)
- Add to MetaCart
This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NPcompleteness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomial-time algorithm for determination of optimal distribution of the arrays are also provided. Keywords: loop parallelization, operation minimization, communication op...
Lazy Strength Reduction
- Journal of Programming Languages
"... We present a bit-vector algorithm that uniformly combines code motion and strength reduction, avoids superfluous register pressure due to unnecessary code motion, and is as efficient as standard unidirectional analyses. The point of this algorithm is to combine the concept of lazy code motion of [1] ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
We present a bit-vector algorithm that uniformly combines code motion and strength reduction, avoids superfluous register pressure due to unnecessary code motion, and is as efficient as standard unidirectional analyses. The point of this algorithm is to combine the concept of lazy code motion of [1] with the concept of unifying code motion and strength reduction of [2, 3, 4, 5]. This results in an algorithm for lazy strength reduction, which consists of a sequence of unidirectional analyses, and is unique in its transformational power. Keywords: Data flow analysis, program optimization, partial redundancy elimination, code motion, strength reduction, bit-vector data flow analyses. 1 Motivation Code motion improves the runtime efficiency of a program by avoiding unnecessary recomputations of a value at runtime. Strength reduction improves runtime efficiency by reducing "expensive" recomputations to less expensive ones, e.g., by reducing computations involving multiplication to computat...
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
TEMPLATE: A generic TEchnology Mapping PLATform
- IN PREPARATION, PREPRINT-REIHE, INSTITUT F"UR INFORMATIK, UNIVERSIT"AT W"URZBURG
, 1997
"... Technology mapping problems arize in logic synthesis systems, when the gap between a synthesized boolean network and the implementation of that network within a given target technology has to be bridged. This paper presents a modular, versatile technology mapping system that supports many differ ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Technology mapping problems arize in logic synthesis systems, when the gap between a synthesized boolean network and the implementation of that network within a given target technology has to be bridged. This paper presents a modular, versatile technology mapping system that supports many different target technologies. Guided by a complexity analysis of the problem, we develop a variety of efficient, exact or heuristic methods for technology driven network clustering. Depending on the target technology and optimization methods and goals, different subnetworks must be provided as candidates for clustering. Methods to achieve this are also included. We conclude with experimental results we obtained with several configurations of the system for different target technologies.
Automatic Compilation of C for Hybrid Reconfigurable Architectures
- UNIVERSITY OF CALIFORNIA BERKELEY
, 2002
"... ..."

