Results 1 - 10
of
33
Dynamic Prediction of Critical Path Instructions
- IN PROCEEDINGS OF THE SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 2001
"... Modern processors come close to executing as fast as true dependences allow. The particular dependences that constrain execution speed constitute the critical path of execution. To optimize the performance of the processor, we either have to reduce the critical path or execute it more efficiently. I ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
Modern processors come close to executing as fast as true dependences allow. The particular dependences that constrain execution speed constitute the critical path of execution. To optimize the performance of the processor, we either have to reduce the critical path or execute it more efficiently. In both cases, it can be done more effectively if we know the actual instructions that constitute that path. This paper
Exploiting Instruction Level Parallelism in the Presence of Conditional Branches
, 1996
"... Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP, enforce strict ordering conditions in programs to ensure correct execution. Therefore, it is difficult to achieve the desired overlap of instruction execution with branches in the instruction stream. To effectively exploit ILP in the presence of branches requires efficient handling of branches and the dependences they impose. This dissertation investigates two techniques for exposing and enhancing ILP in the presence of branches, speculative execution and predicated execution. Speculative execution enables an ILP compiler to remove dependences between instructions and prior branches. In this manner, the execution of instructions and predicted future instructions may be overlapped. Compiler-controlled speculative execution is employed using an efficient structure called the superblock. The formation and optimization of superblocks increase ILP along important execution paths by systematically removing constraints due to unimportant paths. In conjunction with superblock optimizations, speculative execution is utilized to remove control dependences in the superblock
Partial dead code elimination using slicing transformations
- In Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation
, 1997
"... ..."
Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results
- In Proceedings of the Third International Symposium on High-Performance Computer Architecture
, 1997
"... This paper introduces a new architectural approach that supports compiler-synthesized dynamic branch predication. In compiler-synthesized dynamic branch prediction, the compiler generates code sequences that, when executed, digest relevant state information and execution statistics into a condition ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
This paper introduces a new architectural approach that supports compiler-synthesized dynamic branch predication. In compiler-synthesized dynamic branch prediction, the compiler generates code sequences that, when executed, digest relevant state information and execution statistics into a condition bit, or predicate. The hardware then utilizes this information to make predictions. Two categories of such architectures are proposed and evaluated. In Predicate Only Prediction (POP), the hardware simply uses the condition generated by the code sequence as a prediction. In Predicate Enhanced Prediction (PEP), the hardware uses the generated condition to enhance the accuracy of conventional branch prediction hardware. The IMPACT compiler currently provides a minimal level of compiler support for the proposed approach. Experiments based on current predicated code show that the proposed predictors achieve better performance than conventional branch predictors. Furthermore, they enable future c...
Predicated Static Single Assignment
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
, 1999
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose such parallelism, traditional compiler data-flow analysis needs to be extended to predicated code. In this paper, we present
Speculative hedge: Regulating compile-time speculation against profile variations
- In Proceedings of the 29th International Symposium on Microarchitecture
, 1996
"... Path-oriented scheduling methods, such as trace scheduling and hyperblock scheduling, use speculation to extract instruction-level parallelism from control-intensive programs. These methods predict important execution paths in the current scheduling scope using execution pro ling or frequency estima ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Path-oriented scheduling methods, such as trace scheduling and hyperblock scheduling, use speculation to extract instruction-level parallelism from control-intensive programs. These methods predict important execution paths in the current scheduling scope using execution pro ling or frequency estimation. Aggressive speculation is then applied to the important execution paths, possibly at the cost of degraded performance along other paths. Therefore, the speed of the output code can be sensitive to the compiler's ability to accurately predict the important execution paths. Prior work in this area has utilized the speculative yield function by Fisher, coupled with dependence height, to distribute instruction priority among execution paths in the scheduling scope. While this technique provides more stability of performance by paying attention to the needs of all paths, it does not directly address the problem of mismatch between compile-time prediction and run-time behavior. The work presented in this paper extends the speculative yield and dependence height heuristic to explicitly minimize the penalty su ered by other paths when instructions are speculated along a path. Since the execution time of a path is determined by the number of cycles spent between a path's entrance and exit in the scheduling scope, the heuristic attempts to eliminate unnecessary speculation that delays any path's exit. Such control of speculation makes the performance much less sensitive to the actual path taken at run time. The proposed method has a strong emphasis on achieving minimal delay to all exits. Thus the name, speculative hedge, is used. This paper presents the speculative hedge heuristic, and shows how it controls over-speculation in a superblock/hyperblock scheduler. The stability of out-
Path Analysis and Renaming for Predicated Instruction Scheduling
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
, 2000
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper,
Control CPR: A Branch Height Reduction Optimization for EPIC Architectures
- In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation
, 1999
"... The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Cont ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Control CPR can reduce the dependence height of critical paths through branch operations as well as decrease the number of executed branches. In this paper, we present an approach to control CPR that recognizes sequences of branches using profiling statistics. The control CPR transformation is applied to the predominant path through this sequence. Our approach, its implementation, and experimental results are presented. This work demonstrates that control CPR enhances instruction-level parallelism for a variety of application programs and improves their performance across a range of processors. 1 Introduction Increases in microprocessor performance are driven by both increased clock speed and...
Machine-description driven compilers for EPIC and VLIW processors. Design Automation for Embedded Systems
, 1999
"... retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, EPIC compilers, VLIW compilers, code generation, scheduling, register allocation In the past, due to the restricted gate count available on an ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, EPIC compilers, VLIW compilers, code generation, scheduling, register allocation In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively regular interconnect between the registers and the functional units. The central challenges faced by a code generator for an EPIC (Explicitly Parallel Instruction Computing) or VLIW processor are quite different from those for the earlier DSPs and, consequently, so is the structure of a code generator that is designed to be easily retargetable. In this report, we explain the nature of the challenges faced by an EPIC or VLIW compiler and present a strategy for performing code generation in an incremental fashion that is best suited to generating high-quality code efficiently. We also describe the Operation Binding Lattice, a formal model for incrementally binding the opcodes and register assignments in an EPIC code generator. As we show, this reflects the phase structure of the EPIC code generator. It also defines the structure of the machine-description database, which is queried by the code generator for the information that it needs about the target processor. Lastly, we discuss general features of our implementation of these ideas and techniques in Elcor, our EPIC compiler research infrastructure.
Elcor’s Machine Description System: Version 3.0
, 1998
"... retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors,

