Results 1  10
of
102
Efficiently computing static single assignment form and the control dependence graph
 ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1991
"... In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single ass ..."
Abstract

Cited by 839 (7 self)
 Add to MetaCart
In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single assignment form and the control dependence graph have been proposed to represent data flow and control flow propertiee of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimization. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use dominance frontiers, a new concept that may have other applications. We also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program. This paper thus presents strong evidence that these structures can be of practical use in optimization.
The LRPD Test: Speculative RunTime Parallelization of Loops with Privatization and Reduction Parallelization
, 1995
"... Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively e ..."
Abstract

Cited by 212 (38 self)
 Add to MetaCart
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any crossiteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at runtime. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at runtime if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods. The methods presented in this paper differ from and extend our previous work on several important points. First, instead of distributing the loop into inspector and executor loops (the approach taken in all previous work on run time parallelization) we advocate the use of runtime tests to validate the execution of a loop that is speculatively executed in parallel. Second, in addition to array privatization, the new techniques are capa...
Hardwaresoftware codesign of embedded systems
 PROCEEDINGS OF THE IEEE
, 1994
"... This paper surveys the design of embedded computer systems, which use software running on programmable computers to implement system functions. Creating an embedded computer system which meets its performance, cost, and design time goals is a hardwaresoftware codesign problewhe design of the hard ..."
Abstract

Cited by 152 (5 self)
 Add to MetaCart
This paper surveys the design of embedded computer systems, which use software running on programmable computers to implement system functions. Creating an embedded computer system which meets its performance, cost, and design time goals is a hardwaresoftware codesign problewhe design of the hardware and software components influence each other. This paper emphasizes a historical approach to show the relationships between wellunderstood design problems and the asyet unsolved problems in codesign. We describe the relationship between hardware and sofhvare architecture in the early stages of embedded system design. We describe analysis techniques for hardware and software relevant to the architectural choices required for hardwaresoftware codesign. We also describe design and synthesis techniques for codesign and related problems.
Beyond Induction Variables: Detecting and Classifying Sequences Using a Demanddriven SSA Form
 ACM Transactions on Programming Languages and Systems
, 1995
"... this paper we present a practical technique for detecting a broader class of linear induction variables than is usually recognized, as well as several other sequence forms, including periodic, polynomial, geometric, monotonic, and wraparound variables. Our method is based on Factored UseDef (FUD) ..."
Abstract

Cited by 106 (5 self)
 Add to MetaCart
this paper we present a practical technique for detecting a broader class of linear induction variables than is usually recognized, as well as several other sequence forms, including periodic, polynomial, geometric, monotonic, and wraparound variables. Our method is based on Factored UseDef (FUD) chains, a demanddriven representation of the popular Static Single Assignment form. In this form, strongly connected components of the associated SSA graph correspond to sequences in the source program: we describe a simple yet efficient algorithm for detecting and classifying these sequences. We have implemented this algorithm in Nascent, our restructuring Fortran 90+ compiler, and we present some results showing the effectiveness of our approach.
Beyond Induction Variables
, 1992
"... Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been ..."
Abstract

Cited by 90 (6 self)
 Add to MetaCart
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been used successfully up to now, we have found a simple algorithm based on the the Static Single Assignment form of a program that finds all linear induction variables in a loop. Moreover, this algorithm is easily extended to find induction variables in multiple nested loops, to find nonlinear induction variables, and to classify other integer scalar assignments in loops, such as monotonic, periodic and wraparound variables. Some of these other variables are now classified using ad hoc pattern recognition, while others are not analyzed by current compilers. Giving a unified approach improves the speed of compilers and allows a more general classification scheme. We also show how to use these va...
The AlignmentDistribution Graph
 In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... Implementing a dataparallel language such as Fortran 90 on a distributedmemory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communicatio ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
Implementing a dataparallel language such as Fortran 90 on a distributedmemory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignmentdistribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a sideeffect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing runtime speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parsetime peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
Gated SSABased DemandDriven Symbolic Analysis for Parallelizing Compilers
, 1995
"... In this paper, we present a GSAbased technique that performs more efficient and more precise symbolic analysis of predicated assignments, recurrences and index arrays. The efficiency is improved by using a backward substitution scheme that performs resolution of assertions ondemand and uses heuris ..."
Abstract

Cited by 73 (10 self)
 Add to MetaCart
In this paper, we present a GSAbased technique that performs more efficient and more precise symbolic analysis of predicated assignments, recurrences and index arrays. The efficiency is improved by using a backward substitution scheme that performs resolution of assertions ondemand and uses heuristics to limit the number of substitution. The precision is increased by utilizing the gating predicate information embedded in the GSA and the control dependence information in the program flow graph. Examples from array privatization are used to illustrate how the technique aids loop parallelization.
DependenceBased Program Analysis
 In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 1993
"... Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes defuse chains and static single assignment (SSA) form. In this paper, we give a simple graphtheoretic description of the DFG and show ho ..."
Abstract

Cited by 60 (6 self)
 Add to MetaCart
Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes defuse chains and static single assignment (SSA) form. In this paper, we give a simple graphtheoretic description of the DFG and show how the DFG for a program can be constructed in O(EV ) time. We then show how forward and backward dataflow analyses can be performed efficiently on the DFG, using constant propagation and elimination of partial redundancies as examples. These analyses can be framed as solutions of dataflow equations in the DFG. Our construction algorithm is of independent interest because it can be used to construct a program's control dependence graph in O(E) time and its SSA representation in O(EV ) time, which are improvements over existing algorithms. 1 Introduction Anumber of recent papers have focused attention on the problem of speeding up program optimization [FOW87, BMO90, CCF91, PBJ + 91, CFR +...
The Program Structure Tree: Computing Control Regions in Linear Time
, 1994
"... In this paper, we describe the program structure tree (PST), a hierarchical representation of program structure based on single entry single exit (SESE) regions of the control flow graph. We give a lineartime algorithm for finding SESE regions and for building the PST of arbitrary control flow grap ..."
Abstract

Cited by 58 (2 self)
 Add to MetaCart
In this paper, we describe the program structure tree (PST), a hierarchical representation of program structure based on single entry single exit (SESE) regions of the control flow graph. We give a lineartime algorithm for finding SESE regions and for building the PST of arbitrary control flow graphs (including irreducible ones). Next, we establish a connection between SESE regions and control dependence equivalence classes, and show how to use the algorithm to find control regions in linear time. Finally, we discuss some applications of the PST. Many controlflow algorithms, such as construction of Static Single Assignment form, can be speeded up by applying the algorithms in a divideandconquer style to each SESE region on its own. The PST is also used to speed up data flow analysis by exploiting `sparsity'. Experimental results from the Perfect Club and SPEC89 benchmarks confirm that the PST approach finds and exploits program structure.