Results 1 - 10
of
76
ABCD: Eliminating Array Bounds Checks on Demand
, 2000
"... To guarantee execution, Java and other strongly typed languages require bounds checking of array accesses. Because bounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy elimination or i ..."
Abstract
-
Cited by 148 (7 self)
- Add to MetaCart
To guarantee execution, Java and other strongly typed languages require bounds checking of array accesses. Because bounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy elimination or instruction scheduling of memory operations. Furthermore, because it is not expressible at level, the elimination of bounds checks can only be performed at run time, after the program is loaded. Using existing powerful bounds-check optimizers at run time is not feasible, however, because they are too heavyweight for the dynamic compilation setting. ABCD is a light-weight algorithm for elimination of &ray Checks on Demand. Its design emphasizes simplicity and efficiency. In essence, ABCD works by adding a few edges to the SSA value graph and performing a simple traversal of the graph. Despite its simplicity, ABCD is surprisingly powerful. On our benchmarks, ABCD removes on average 45 % of dynamic bound check instructions, sometimes achieving near-ideal optimization. The efficiency of ABCD stems from two factors. First, ABCD works on a representation. As a result, it requires on average fewer than 10 simple analysis steps per bounds check. Second, ABCD is demand-driven. It can be applied to a set of frequently executed (hot) bounds checks, which makes it suitable for the dynamic-compilation setting, in which compile-time cost is constrained but hot statements are known.
Online Feedback-Directed Optimization of Java
, 2002
"... This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior... ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
(Show Context)
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior...
Transparent dynamic optimization: The design and implementation of Dynamo
, 1999
"... Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, t ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
(Show Context)
Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language, compiler, operating system or hardware support. Contrary to
Timestamped Whole Program Path Representation and its Applications
- in Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, Snowbird
, 2001
"... A whole program path (WPP) is a complete control ow trace of a program's execution. Recently Larus [18] showed that although WPP is expected to be very large (100's of MBytes), it can be greatly compressed (to 10's of MBytes) and therefore saved for future analysis. While the compress ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
(Show Context)
A whole program path (WPP) is a complete control ow trace of a program's execution. Recently Larus [18] showed that although WPP is expected to be very large (100's of MBytes), it can be greatly compressed (to 10's of MBytes) and therefore saved for future analysis. While the compression algorithm proposed by Larus is highly eective, the compression is accompanied with a loss in the ease with which subsets of information can be accessed. In particular, path traces pertaining to a particular function cannot generally be obtained without examining the entire compressed WPP representation. To solve this problem we advocate the application of compaction techniques aimed at providing easy access to path traces on a per function basis.
Whole Execution Traces
, 2004
"... Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data produced by realistic program runs, most work has focused on separately collecting and compressing different types of profiles. In this paper we present a unified representation of profiles called Whole Execution Trace (WET) which includes the complete information contained in each of the above types of traces. Thus WETs provide a basis for a next generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles. The key features of our WET representation are: WET is constructed by labeling a static program representation with profile information such that relavent and related profile information can be directly accessed by analysis algorithms as they traverse the representation; a highly effective two tier strategy is used to significantly compress the WET; and compression techniques are designed such that they do not adversely affect the ability to rapidly traverse WET for extracting subsets of information corresponding to individual profile types as well as a combination of profile types (e.g., in form of dynamic slices of WETs). Our experimentation shows that on an average execution traces resulting from execution of 647 Million statements can be stored in 331 Megabytes of storage after compression. The compression factors range from 16 to 83. Moreover the rates at which different types of profiles can be individually or simultaneously extracted are high.
Relational Profiling: Enabling Thread-Level Parallelism in Virtual Machines
, 2000
"... Virtual machine service threads can perform many tasks in parallel with program execution such as garbage collection, dynamic compilation, and profile collection and analysis. Hardware-assisted profiling is essential for providing service threads with needed information in a flexible and efficient w ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Virtual machine service threads can perform many tasks in parallel with program execution such as garbage collection, dynamic compilation, and profile collection and analysis. Hardware-assisted profiling is essential for providing service threads with needed information in a flexible and efficient way. A relational profiling architecture (RPA) is proposed for meeting this goal.
Achieving High Performance via Co-Designed Virtual Machines
- In International Workshop on Innovative Architecture
, 1999
"... Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, vi ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
(Show Context)
Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, virtual machines also open many new opportunities for enhancing performance. The co-design of virtual machine software and the underlying hardware microarchitecture will enable enhanced instruction level parallelism and more adaptable performance mechanisms than are possible when hardware and application software are separated by instruction set architectures as is traditionally done. In future high performance computers, a virtual instruction set architecture (V-ISA) will be the level for maintaining architectural compatibility. The V-ISA will be implemented with a virtual machine that blends software and hardware in a symbiotic manner via co-design. The hardware will support an implementationdep
A compiler framework for speculative analysis and optimizations
- In Proceedings of the ACM SIGPLAN ’03 Conference on Programming Language Design and Implementation
, 2003
"... Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately incorporate and exploit control speculation. However, very little ha ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
(Show Context)
Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately incorporate and exploit control speculation. However, very little has been done so far to allow existing compiler frameworks to incorporate and exploit data speculation effectively in various program transformations beyond instruction scheduling. This paper proposes a speculative SSA form to incorporate information from alias profiling and/or heuristic rules for data speculation, thus allowing existing program analysis frameworks to be easily extended to support both control and data speculation. Such a general framework is very useful for EPIC architectures that provide checking (such as advanced load address table (ALAT) [10]) on data speculation to guarantee the correctness of program execution. We use SSAPRE [21] as one example to illustrate how to incorporate data speculation in those important compiler optimizations such as partial redundancy elimination (PRE), register promotion, strength reduction and linear function test replacement. Our extended framework allows both control and data speculation to be performed on top of SSAPRE and, thus, enables more aggressive speculative optimizations. The proposed framework has been implemented on Intel's Open Research Compiler (ORC). We present experimental data on some SPEC2000 benchmark programs to demonstrate the usefulness of this framework and how data speculation benefits partial redundancy elimination.
The Case for Virtual Register Machines
, 2004
"... Virtual machines (VMs) are a popular target for language implementers. A longrunning question in the design of virtual machines has been whether stack or register architectures can be implemented more efficiently with an interpreter. Many designers favour stack architectures since the location of op ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Virtual machines (VMs) are a popular target for language implementers. A longrunning question in the design of virtual machines has been whether stack or register architectures can be implemented more efficiently with an interpreter. Many designers favour stack architectures since the location of operands is implicit in the stack pointer. In contrast, the operands of register machine instructions must be specified explicitly. In this paper, we present a working system for translating stackbased Java virtual machine (JVM) code to a simple register code. We describe the translation process, the complicated parts of the JVM which make translation more difficult, and the optimisations needed to eliminate copy instructions. Experimental results show that a register format reduced the number of executed instructions by 34.88%, while increasing the number of bytecode loads by an average of 44.81%. Overall, this corresponds to an increase of 2.32 loads for each dispatch removed. We believe that the high cost of dispatches makes register machines attractive even at the cost of increased loads.
Verified Validation of Lazy Code Motion
, 2008
"... Translation validation establishes a posteriori the correctness of a run of a compilation pass or other program transformation. In this paper, we develop an efficient translation validation algorithm for the Lazy Code Motion (LCM) optimization. LCM is an interesting challenge for validation because ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
Translation validation establishes a posteriori the correctness of a run of a compilation pass or other program transformation. In this paper, we develop an efficient translation validation algorithm for the Lazy Code Motion (LCM) optimization. LCM is an interesting challenge for validation because it is a global optimization that moves code across loops. Consequently, care must be taken not to move computations that may fail before loops that may not terminate. Our validator includes a specific check for anticipability to rule out such incorrect moves. We present a mechanicallychecked proof of correctness of the validation algorithm, using the Coq proof assistant. Combining our validator with an unverified implementation of LCM, we obtain a LCM pass that is provably semantics-preserving and was integrated in the CompCert formally verified compiler.