Results 1 -
8 of
8
Code specialization based on value profiles
- In Static Analysis Symposium
, 2000
"... Abstract. It is often the case at runtime that variables and registers in programs are “quasi-invariant, ” i.e., the distribution of the values they take on is very skewed, with a small number of values occurring most of the time. Knowledge of such frequently occurring values can be exploited by a c ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
Abstract. It is often the case at runtime that variables and registers in programs are “quasi-invariant, ” i.e., the distribution of the values they take on is very skewed, with a small number of values occurring most of the time. Knowledge of such frequently occurring values can be exploited by a compiler to generate code that optimizes for the common cases without sacrificing the ability to handle the general case. The idea can be generalized to the notion of expression profiles, which profile the runtime values of arbitrary expressions and can permit optimizations that may not be possible using simple value profiles. Since this involves the introduction of runtime tests, a careful cost-benefit analysis is necessary to make sure that the benefits from executing the code specialized for the common values outweigh the cost of testing for these values. This paper describes a static cost-benefit analysis that allows us to discover when such specialization is profitable. Experimental results, using such an analysis and an implementation of low-level code specialization based on value and expression profiles within a link-time code optimizer, are given to validate our approach. 1
Effectively Exploiting Indirect Jumps
- Software Practice and Experience
, 1997
"... This dissertation describes a general code-improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This dissertation describes a general code-improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Second, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In fact, over twice the benefit was achieved from exploiting indirect jumps as a general code-improving transformation instead of using the traditional approach of producing indirect jumps as an intermediate code generation decision. In addition, the author show that with comparable branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced.
Improving Performance By Branch Reordering
, 1998
"... ix 1 INTRODUCTION 1 2 RELATED WORK 6 3 DETECTING A SEQUENCE OF REORDERABLE BRANCHES 9 3.1 Detecting a Sequence of Reorderable Branches with a Common Successor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.2 Detecting a Sequence of Reorderable Range Conditions Comparing a Common ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
ix 1 INTRODUCTION 1 2 RELATED WORK 6 3 DETECTING A SEQUENCE OF REORDERABLE BRANCHES 9 3.1 Detecting a Sequence of Reorderable Branches with a Common Successor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.2 Detecting a Sequence of Reorderable Range Conditions Comparing a Common Variable to Constants : : : : : : : : : : : : : : : 15 4 HANDLING SIDE EFFECTS IN A COMMON VARIABLE SEQUENCE 22 5 PERFORMING PROFILING 28 5.1 Producing Profile Information for Common Successor Sequence : 28 5.2 Producing Profile Information for Common Variable Sequence : 33 iv 6 SELECTING THE ORDERING OF BRANCHES 37 6.1 Selecting the Order of a Sequence of Branches with Common Successors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 6.2 Selecting the Order of a Sequence of Range Conditions Comparing a Common Variable : : : : : : : : : : : : : : : : : : : : : : : 40 7 IMPROVING THE SELECTED SEQUENCE OF RANGE CONDITIONS 46 8 APPLYING THE REORDERING TRANSFORMATION 49 9 RES...
Efficient and effective branch reordering using profile data
- ACM Transactions on Programming Languages and Systems (TOPLAS
, 2002
"... The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substan ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches that compare a common variable to constants. The goal is to obtain an ordering where the fewest av erage number of branches in the sequence will be executed. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often includes the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow isrestructured to reflect the new ordering. The results of applying the transformation are on average reductions of about 8% fewer instructions executed and 13 % branches performed, as well as about a 4 % decrease in execution time.
Techniques for Effectively Exploiting a Zero Overhead Loop Buffer
"... . A Zero Overhead Loop Buer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specied number of times. Unlike loop unrolling, a loop buer can be used to ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
. A Zero Overhead Loop Buer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specied number of times. Unlike loop unrolling, a loop buer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to eectively use a ZOLB. The authors have found that many common improving transformations used by optimizing compilers to improve code on conventional architectures can be exploited (1) to allow more loops to be placed in a ZOLB, (2) to further reduce loop overhead of the loops placed in a ZOLB, and (3) to avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural feature can often...
Coalescing Conditional Branches into Efficient Indirect Jumps
- Proceedings of the International Static Analysis Symposium
, 1997
"... Indirect jumps from tables are traditionally only generated by compilers as an intermediate code generation decision when translating multiway selection statements. However, making this decision during intermediate code generation poses problems. The research described in this paper resolves these p ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Indirect jumps from tables are traditionally only generated by compilers as an intermediate code generation decision when translating multiway selection statements. However, making this decision during intermediate code generation poses problems. The research described in this paper resolves these problems by using several types of static analysis as a framework for a code improving transformation that exploits indirect jumps from tables. First, control-flow analysis is performed that provides opportunities for coalescing branches generated from other control statements besides multiway selection statements. Second, the optimizer uses various techniques to reduce the cost of indirect jump operations by statically analyzing the context of the surrounding code. Finally, path and branch prediction analysis is used to provide a more accurate estimation of the benefit of coalescing a detected set of branches into a single indirect jump. The results indicate that the coalescing transformation can be frequently applied with significant reductions in the number of instructions executed and total cache work. This paper shows that static analysis can be used to implement an effective improving transformation for exploiting indirect jumps.
Decreasing process memory requirements by overlapping program portions
- In Proceedings of the Hawaii International Conference on System Sciences
, 1998
"... Most of the time, faced with a time/space trade-off, a compiler writer will choose to optimize time, even at the cost of space. This was not always the case. Early in the history of computers, programmers would try everything they could think of to reduce the size of their code to get it to fit in t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Most of the time, faced with a time/space trade-off, a compiler writer will choose to optimize time, even at the cost of space. This was not always the case. Early in the history of computers, programmers would try everything they could think of to reduce the size of their code to get it to fit in the computer’s constrained space. As memory and
Keshav PingaliAtomic Block Formation for Explicit Data Graph Execution Architectures
"... I would like to thank the numerous people who have helped me along the way to completing this dissertation. I thank my committee, Kathryn McKinley, Doug Burger, Steve Keckler, Keshav Pingali, and Scott Mahlke, for their valuable feedback on my research. Their insight has unquestionably improved this ..."
Abstract
- Add to MetaCart
I would like to thank the numerous people who have helped me along the way to completing this dissertation. I thank my committee, Kathryn McKinley, Doug Burger, Steve Keckler, Keshav Pingali, and Scott Mahlke, for their valuable feedback on my research. Their insight has unquestionably improved this dissertation. I am particularly indebted to my advisors, Doug Burger and Kathryn McKinley, for their supervision of this work. I count myself lucky to have had scientists of their caliber to oversee my research. Doug and Kathryn have always challenged me to improve my ideas, yet also encouraged me that those ideas are worth pursuing. I hope that my career in computer science will make them proud. I thank Steve Keckler for his leadership of the TRIPS project along with Doug and Kathryn. My career thus far has been enriched by working on such an ambitious research project as a graduate student. I thank the TRIPS team without whom this work would have been impossible.

