Results 1 - 10
of
28
An Infrastructure for Adaptive Dynamic Optimization
, 2003
"... Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework ..."
Abstract
-
Cited by 130 (5 self)
- Add to MetaCart
Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight, API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc.. To demonstrate
ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors
, 1997
"... Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and ..."
Abstract
-
Cited by 118 (2 self)
- Add to MetaCart
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an alternative approach, called ProfileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also support paired sampling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software...
A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization
- In Proceedings of the 26th Annual International Symposium on Computer Architecture
, 1999
"... This paper presents a novel hardware-based approach for identifying, profiling, and monitoring hot spots in order to support runtime optimization of generalpurpose programs. The proposed approach consists of a set of tightly coupled hardware tables and control logic modules that are placed in the re ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
This paper presents a novel hardware-based approach for identifying, profiling, and monitoring hot spots in order to support runtime optimization of generalpurpose programs. The proposed approach consists of a set of tightly coupled hardware tables and control logic modules that are placed in the retirement stage of a processor pipeline removed from the critical path. The features of the proposed design include rapid detection of program hot spots after changes in execution behavior, runtime-tunable selection criteria for hot spot detection, and negligible overhead during application execution. Experiments using several SPEC95 benchmarks, as well as several large WindowsNT applications, demonstrate the promise of the proposed design. 1 Introduction Optimizing compilers can gain significant performance benefits by performing code transformations based on a program's runtime profile. Traditionally, profiles are collected by running an instrumented version of the executable. However, bec...
Transparent dynamic optimization: The design and implementation of Dynamo
, 1999
"... dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capabl ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language,
Design and Implementation of a Dynamic Optimization Framework for Windows
- In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4
, 2000
"... In this paper we explicitly address the challenges of building a dynamic optimization infrastructure. Our goal is to provide an engineering manual for researchers to motivate and facilitate engagement in dynamic optimization technology. The dynamic optimization framework we discuss has been implemen ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
In this paper we explicitly address the challenges of building a dynamic optimization infrastructure. Our goal is to provide an engineering manual for researchers to motivate and facilitate engagement in dynamic optimization technology. The dynamic optimization framework we discuss has been implemented for the IA-32 family of architectures in a Windows environment. We describe implementation challenges specific to Windows and issues with multiple threads and cache management. We also give performance results for our implementation and indicate the overheads involved. This framework opens up opportunities for program introspection and performance enhancement as all program information is available at runtime.
alto: A Link-Time Optimizer for the Compaq Alpha
- Software - Practice and Experience
, 1999
"... Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before linktime such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole-program optimization at link time. This paper describes alto, a link-time optimizer for the Compaq Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster that that obtained using the OM link-time optimizer, even when the latter is used in conjunction with profile-guided and inter-fi...
Optimizing Alpha Executables on Windows NT with Spike
, 1997
"... This paper discusses the Spike performance tool and its use in optimizing Windows NT--based applications running on Alpha processors. In the following section, we describe the characteristics of Windows NT--based ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
This paper discusses the Spike performance tool and its use in optimizing Windows NT--based applications running on Alpha processors. In the following section, we describe the characteristics of Windows NT--based
Efficient, Transparent and Comprehensive Runtime Code Manipulation
, 2004
"... This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every i ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the
A region-based compilation technique for a Java just-in-time compiler
- In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation
, 2003
"... Method inlining and data flow analysis are two major optimization components for effective program transformations, however they often suffer from the existence of rarely or never executed code contained in the target method. One major problem lies in the assumption that the compilation unit is part ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Method inlining and data flow analysis are two major optimization components for effective program transformations, however they often suffer from the existence of rarely or never executed code contained in the target method. One major problem lies in the assumption that the compilation unit is partitioned at method boundaries. This paper describes the design and implementation of a region-based compilation technique in our dynamic compilation system, in which the compiled regions are selected as code portions without rarely executed code. The key part of this technique is the region selection, partial inlining, and region exit handling. For region selection, we employ both static heuristics and dynamic profiles to identify rare sections of code. The region selection process and method inlining decision are interwoven, so that method inlining exposes other targets for region selection, while the region selection in the inline target conserves the inlining budget, leading to more method inlining. Thus the inlining process can be performed for parts of a method, not for the entire body of the method. When the program attempts to exit from a region boundary, we trigger recompilation and then rely on on-stack replacement to continue the execution from the corresponding entry point in the recompiled code. We have implemented these techniques in our Java JIT compiler, and conducted a comprehensive evaluation. The experimental results show that the approach of region-based compilation achieves approximately 5 % performance improvement on average, while reducing the compilation overhead by 20 to 30%, in comparison to the traditional functionbased compilation techniques.
A region-based compilation technique for dynamic compilers
- ACM Trans. Program. Lang. Syst
"... Method inlining and data flow analysis are two major optimization components for effective program transformations, but they often suffer from the existence of rarely or never executed code contained in the target method. One major problem lies in the assumption that the compilation unit is partitio ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Method inlining and data flow analysis are two major optimization components for effective program transformations, but they often suffer from the existence of rarely or never executed code contained in the target method. One major problem lies in the assumption that the compilation unit is partitioned at method boundaries. This article describes the design and implementation of a region-based compilation technique in our dynamic optimization framework, in which the compiled regions are selected as code portions without rarely executed code. The key parts of this technique are the region selection, partial inlining, and region exit handling. For region selection, we employ both static heuristics and dynamic profiles to identify and eliminate rare sections of code. The region selection process and method inlining decisions are interwoven, so that method inlining exposes other targets for region selection, while the region selection in the inline target conserves the inlining budget, allowing more method inlining to be performed. The inlining process can be performed for parts of a method, not just for the entire body of the method. When the program attempts to exit from a region boundary, we trigger recompilation and then use on-stack replacement to continue the execution from the corresponding entry point in the recompiled code. We have implemented these techniques in our Java JIT compiler, and conducted a comprehensive evaluation. The experimental results show that our region-based compilation approach achieves approximately 4 % performance improvement on average, while reducing the compilation overhead by 10 % to 30%, in comparison to the traditional method-based compilation techniques.

