Results 1 - 10
of
20
Transparent dynamic optimization: The design and implementation of Dynamo
, 1999
"... dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capabl ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language,
On embedding a microarchitectural design language within Haskell
- In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP ’99
, 1999
"... Based on our experience with modelling and verifying microarchitectural designs within Haskell, this paper examines our use of Haskell as host for an embedded language. In particular, we highlight our use of Haskell's lazy lists, type classes, lazy state monad, and unsafePerformIO, and point to seve ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
Based on our experience with modelling and verifying microarchitectural designs within Haskell, this paper examines our use of Haskell as host for an embedded language. In particular, we highlight our use of Haskell's lazy lists, type classes, lazy state monad, and unsafePerformIO, and point to several areas where Haskell could be improved in the future. We end with an example of a benefit gained by bringing the functional perspective to microarchitectural modelling.
Predicated Static Single Assignment
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
, 1999
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose such parallelism, traditional compiler data-flow analysis needs to be extended to predicated code. In this paper, we present
Value Speculation Scheduling for High Performance Processors
- in Eighth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1998
"... Recent research in value prediction shows a surprising amount of predictability for the values produced by register-writing instructions. Several hardware based value predictor designs have been proposed to exploit this predictability by eliminating flow dependencies for highly predictable values. T ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Recent research in value prediction shows a surprising amount of predictability for the values produced by register-writing instructions. Several hardware based value predictor designs have been proposed to exploit this predictability by eliminating flow dependencies for highly predictable values. This paper proposed a hardware and software based scheme for value speculation scheduling (VSS). Static VLIW scheduling techniques are used to speculate value dependent instructions by scheduling them above the instructions whose results they are dependent on. Prediction hardware is used to provide value predictions for allowing the execution of speculated instructions to continue. In the case of miss-predicted values, control flow is redirected to patch-up code so that execution can proceed with the correct results. In this paper, experiments in VSS for load operations in the SPECint95 benchmarks are performed. Speedup of up to 17% has been shown for using VSS. Empirical results on the value...
Path Analysis and Renaming for Predicated Instruction Scheduling
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
, 2000
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper,
Integrated predicated and speculative execution in the impact epic architecture
- In ISCA
, 1998
"... Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruction level parallelism directly to the hardware. EPIC techniques which enable the compiler to represent control speculation, data dependence speculation, and predication have individually be ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruction level parallelism directly to the hardware. EPIC techniques which enable the compiler to represent control speculation, data dependence speculation, and predication have individually been shown to be very effective. However, these techniques have not been studied in combination with each other. This paper presents the IMPACT EPIC Architecture to address the issues involved in designing processors based on these EPIC concepts. In particular, we focus on new execution and recovery models in which microarchitectural support for predicated execution is also used to enable efficient recovery from exceptions caused by speculatively executed instructions. This paper demonstrates that a coherent framework to integrate the three techniques can be elegantly designed to achieve much better performance than each individual technique could alone provide. 1.
Incorporating Predicate Information into Branch Predictors
- In HPCA-9
, 2003
"... Predicated Execution can be used to alleviate the costs associated with frequently mispredicted branches. This is accomplished by trading the cost of a mispredicted branch for execution of both paths following the conditional branch. In this paper we examine two enhancements for branch prediction in ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Predicated Execution can be used to alleviate the costs associated with frequently mispredicted branches. This is accomplished by trading the cost of a mispredicted branch for execution of both paths following the conditional branch. In this paper we examine two enhancements for branch prediction in the presence of predicated code. Both of the techniques use recently calculated predicate definitions to provide a more intelligent branch prediction. The first branch predictor, called the Squash False Path Filter, recognizes fetched branches known to be guarded with a false predicate and predicts them as not-taken with 100% accuracy. The second technique, called the Predicate Global Update branch predictor, improves prediction by incorporating recent predicate information into the branch predictor. We use these techniques to aid the prediction of region-based branches. A region-based branch is a branch that is left in a predicated region of code. A regionbased branch may be correlated with predicate definitions in the region in addition to those that define the branch's guarding predicate. 1.
Interactive Source-Level Debugging of Optimized Code
, 2000
"... With an increasing number of executable binaries generated by optimizing compilers today to fully utilize advanced architecture features, it has become a necessity to support debugging optimized code. One of the most di cult problems in debugging globally optimized code is to recover the expected va ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
With an increasing number of executable binaries generated by optimizing compilers today to fully utilize advanced architecture features, it has become a necessity to support debugging optimized code. One of the most di cult problems in debugging globally optimized code is to recover the expected variable values at source breakpoints. To solve this problem, the debugger not only has to stop the execution at appropriate places to preserve necessary program state, but also needs to be able to correctly associate storage locations with source variables. In this dissertation, a new framework for debugging globally optimized code is pro-posed. This framework consists of a novel breakpoint implementation scheme and a new data location tracking mechanism. In the proposed breakpoint implementation scheme, the debugger takes over the control of execution early and executes instructions under a new forward recovery model. This enables the debugger to recover the expected be-havior of a program even in the presence of optimization. Also the source breakpoints are reported to the user in the order speci ed by the original source program and the behavior of exceptions meets what the user expects. The new data
A Systematic Approach to Delivering INSTRUCTION-LEVEL PARALLELISM IN EPIC SYSTEMS
, 2005
"... Computer systems designed under the explicitly parallel instruction computing (EPIC) paradigm rely extensively on compiler technology to deliver the instruction-level parallelism (ILP) required for them to achieve high levels of performance. While manifold techniques have been proposed in the litera ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Computer systems designed under the explicitly parallel instruction computing (EPIC) paradigm rely extensively on compiler technology to deliver the instruction-level parallelism (ILP) required for them to achieve high levels of performance. While manifold techniques have been proposed in the literature for delivering such parallelism, this dissertation is unique in integrating and applying a comprehensive suite of techniques, embodied in the IMPACT Research Compiler, to a concrete system, comprised of the SPEC CINT2000 benchmarks and the Intel Itanium 2 platform. These techniques include advanced pointer analysis, aggressive cross-file procedure inlining, targeted region formation, profile-guided optimizations, and, most importantly, aggressive and pervasive use of predication and control speculation. The collective effect of these techniques is evaluated with real-system measurements, showing them to achieve a 1.20 average (up to 1.59) speedup relative to classically optimized code and a 1.70 average (up to 2.51) speedup relative to code compiled with the Gnu GCC compiler. Achieving these results in the real-machine environment required advances in region formation heuristics, optimization, and speculation methods. Modern
Low-Power VLIW Processors: A High-Level Evaluation
- in proc. of PATMOS’ 98, Octomber
, 1998
"... Processors having both low-power consumption and high-performance are more and more required in the portable systems market. Although it is easy to find processors with one of these characteristics, it is harder to find a processor having both of them at the same time. In this paper, we evaluate the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Processors having both low-power consumption and high-performance are more and more required in the portable systems market. Although it is easy to find processors with one of these characteristics, it is harder to find a processor having both of them at the same time. In this paper, we evaluate the possibility of designing a high-performance, low-consumption processor and investigate whether instruction-level parallelism architectures can be adapted to low-power processors. We find that an adaptation of high-performance architecture, such as the VLIW architecture, to low-power 8b or 16b microprocessors yields a significant improvement in the processor's performance while keeping the same energy consumption.

