Results 1 - 10
of
33
Branch prediction, instruction-window size, and cache size: Performance tradeoffs and simulation techniques
- IEEE Transactions on Computers
, 1999
"... Design parameters interact in complex ways in modern processors, especially because out-of-order issue and decoupling buffers allow latencies to be overlapped. Tradeoffs among instruction-window size, branch-prediction accuracy, and instruction- and datacache size can change as these parameters move ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
Design parameters interact in complex ways in modern processors, especially because out-of-order issue and decoupling buffers allow latencies to be overlapped. Tradeoffs among instruction-window size, branch-prediction accuracy, and instruction- and datacache size can change as these parameters move through different domains. For example, modeling unrealistic caches can under- or over-state the benefits of better prediction or a larger instruction window. Avoiding such pitfalls requires understanding how all these parameters interact. Because such methodological mistakes are common, this paper provides a comprehensive set of SimpleScalar simulation results from SPECint95 programs, showing the interactions among these major structures. In addition to presenting this database of simulation results, major mechanisms driving the observed tradeoffs are described. The paper also considers appropriate simulation techniques when sampling full-length runs with the SPEC reference inputs. In particular, the results show that branch mispredictions limit the benefits of larger instruction windows, that better branch prediction and better instruction cache behavior have synergistic effects, and that the benefits of larger instruction windows and larger data caches trade off and have overlapping effects. In addition, simulations of only 50 million instructions can yield representative results if these short windows are carefully selected.
A Framework for Balancing Control Flow and Predication
- IN PROCEEDINGS OF THE 30TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 1997
"... Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. I ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. In order to effectively apply if-conversion, one must address two major issues: what should be if-converted and when the if-conversion should be applied. A compiler's use of predication as a representation is most effective when large amounts of code are if-converted and if-conversion is performed early in the compilation procedure. On the other hand, the final code generated for a processor with predicated execution requires a delicate balance between control flow and predication to achieve efficient execution. The appropriate balance is tightly coupled with scheduling decisions and detailed processor characteristics. This paper presents an effective compilation framework that allows the compiler to maximize the benefits of predication as a compiler representation while delaying the final balancing of control flow and predication to schedule time.
Exploiting Instruction Level Parallelism in the Presence of Conditional Branches
, 1996
"... Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP, enforce strict ordering conditions in programs to ensure correct execution. Therefore, it is difficult to achieve the desired overlap of instruction execution with branches in the instruction stream. To effectively exploit ILP in the presence of branches requires efficient handling of branches and the dependences they impose. This dissertation investigates two techniques for exposing and enhancing ILP in the presence of branches, speculative execution and predicated execution. Speculative execution enables an ILP compiler to remove dependences between instructions and prior branches. In this manner, the execution of instructions and predicted future instructions may be overlapped. Compiler-controlled speculative execution is employed using an efficient structure called the superblock. The formation and optimization of superblocks increase ILP along important execution paths by systematically removing constraints due to unimportant paths. In conjunction with superblock optimizations, speculative execution is utilized to remove control dependences in the superblock
Global Predicate Analysis and its Application to Register Allocation
- in Proceedings of the 29th International Symposium on Microarchitecture
, 1996
"... To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different b ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. However, predicated execution also presents significant challenges to an optimizing compiler. For example, in live range analysis, a predicated definition does not necessarily end the live range of a virtual register. This paper describes techniques to analyze the relations among predicates in order to improve the precision and effectiveness of various compiler analysis and transformation phases in the presence of predicated code. Our predicate analysis operates globally to obtain relations among predicates. Moreover, we analyze control flow and predication in a single unified framework. The result can be queri...
Predicated Static Single Assignment
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
, 1999
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose such parallelism, traditional compiler data-flow analysis needs to be extended to predicated code. In this paper, we present
Path-Sensitive Value-Flow Analysis
- In Symposium on Principles of Programming Languages
, 1998
"... When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the val ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the values whose flow is of interest to a given data-flow problem. Furthermore, to exploit recomputation of a value with multiple, synonymous names, path-sensitive value numbering on the synthetic name space is developed. Optimizations that rely on value flow to detect redundant computations, such as partial redundancy elimination and constant propagation, become more powerful when phrased in our framework. The framework is built on a new program representation called Value Name Graph (VNG) which gains its power from integrating three orthogonal techniques: symbolic back-substitution, value numbering, and data-flow analysis. Our experiments with the implementation show that analysis on the VNG is p...
Value Speculation Scheduling for High Performance Processors
- in Eighth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1998
"... Recent research in value prediction shows a surprising amount of predictability for the values produced by register-writing instructions. Several hardware based value predictor designs have been proposed to exploit this predictability by eliminating flow dependencies for highly predictable values. T ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Recent research in value prediction shows a surprising amount of predictability for the values produced by register-writing instructions. Several hardware based value predictor designs have been proposed to exploit this predictability by eliminating flow dependencies for highly predictable values. This paper proposed a hardware and software based scheme for value speculation scheduling (VSS). Static VLIW scheduling techniques are used to speculate value dependent instructions by scheduling them above the instructions whose results they are dependent on. Prediction hardware is used to provide value predictions for allowing the execution of speculated instructions to continue. In the case of miss-predicted values, control flow is redirected to patch-up code so that execution can proceed with the correct results. In this paper, experiments in VSS for load operations in the SPECint95 benchmarks are performed. Speedup of up to 17% has been shown for using VSS. Empirical results on the value...
Path Analysis and Renaming for Predicated Instruction Scheduling
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
, 2000
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper,
Control CPR: A Branch Height Reduction Optimization for EPIC Architectures
- In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation
, 1999
"... The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Cont ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Control CPR can reduce the dependence height of critical paths through branch operations as well as decrease the number of executed branches. In this paper, we present an approach to control CPR that recognizes sequences of branches using profiling statistics. The control CPR transformation is applied to the predominant path through this sequence. Our approach, its implementation, and experimental results are presented. This work demonstrates that control CPR enhances instruction-level parallelism for a variety of application programs and improves their performance across a range of processors. 1 Introduction Increases in microprocessor performance are driven by both increased clock speed and...
The impact of if-conversion and branch prediction on program execution on the intel itanium processor
- In MICRO-34
, 2001
"... The research community has studied if-conversion for many years. However, due to the lack of existing hardware, studies were conducted by simulating code generated by experimental compilers. This paper presents the first comprehensive study of the use of predication to implement if-conversion on pro ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The research community has studied if-conversion for many years. However, due to the lack of existing hardware, studies were conducted by simulating code generated by experimental compilers. This paper presents the first comprehensive study of the use of predication to implement if-conversion on production hardware with a near-production compiler. To better understand trends in the measurements, we generated binaries at three increasing levels of if-conversion aggressiveness. For each level, we gathered data regarding the global runtime effects of if-conversion on overall execution time, register pressure, code size, and branch behavior. Furthermore, we studied the inherent characteristics of program control-flow

