Results 1 - 10
of
21
Efficient, Transparent and Comprehensive Runtime Code Manipulation
, 2004
"... This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every i ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the
Walkabout - A Retargetable Dynamic Binary Translation Framework
- In Proceedings of the 2002 Workshop on Binary Translation
, 2002
"... Dynamic compilation techniques have found a renaissance in recent years due to their use in high-performance implementations of the Java(TM) language. Techniques originally developed for use in virtual machines for such object-oriented languages as Smalltalk are now commonly used in Java virtual mac ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Dynamic compilation techniques have found a renaissance in recent years due to their use in high-performance implementations of the Java(TM) language. Techniques originally developed for use in virtual machines for such object-oriented languages as Smalltalk are now commonly used in Java virtual machines (JVM(TM)) and Java just-in-time compilers. These techniques have also been applied to binary translation in recent years, most commonly appearing in binary optimizers for a given platform that improve the performance of binary programs while they execute.
IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems
- In 36th International Symposium on Microarchitecture
, 2003
"... IA-32 Execution Layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
IA-32 Execution Layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL---software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 Execution Layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX^TM, and Intel Streaming SIMD Extension instructions, and misalignment handling. Finally, the paper presents some performance results.
Maintaining consistency and bounding capacity of software code caches
- Int’l. Symp. on Code Generation and Optimization
, 2005
"... Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emul ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emulation and meeting or exceeding native performance in dynamic optimizers. One disadvantage of caching, memory expansion, can sometimes be ignored when executing a single application. However, as optimizers and translators are applied more and more in production systems, the memory expansion from running multiple applications simultaneously becomes problematic. A second drawback to caching is the added requirement of maintaining consistency between the code cache and the original code. On architectures like IA-32 that do not require explicit application actions when modifying code, detecting code changes is challenging. Again, consistency can be ignored for certain sets of applications, but as caching systems scale up to executing large, modern, complex programs, consistency becomes critical. This paper presents efficient schemes for keeping a software code cache consistent and for dynamically bounding code cache size to match the current working set of the application. These schemes are evaluated in the DynamoRIO runtime code manipulation system, and operate on stock hardware in the presence of multiple threads and dynamic behavior, including dynamically-loaded, generated, and even modified code. 1
Dynamic Binary Translation for Accumulator-Oriented Architectures
, 2003
"... A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction pipeline and issue logic. To support conventional program binaries, a source instruction set (Compaq Alpha in our study) is dynamically translated to the target accumulator instruction set. The binary translator identifies chains of inter-instruction dependences and assigns them to dependence-carrying accumulators. Because the underlying superscalar microarchitecture is capable of dynamic instruction scheduling, the binary translation system does not perform aggressive optimizations or re-schedule code; this significantly reduces binary translation overhead.
Evaluating Indirect Branch Handling Mechanisms
- in Software Dynamic Translation Systems”, Int’l. Symp. on Code Generation and Optimization
, 2007
"... Software Dynamic Translation (SDT) systems are used for program instrumentation, dynamic optimization, security, intrusion detection, and many other uses. As noted by many researchers, a major source of SDT overhead is the execution of code which is needed to translate an indirect branch’s target ad ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Software Dynamic Translation (SDT) systems are used for program instrumentation, dynamic optimization, security, intrusion detection, and many other uses. As noted by many researchers, a major source of SDT overhead is the execution of code which is needed to translate an indirect branch’s target address into the address of the translated destination block. This paper discusses the sources of indirect branch (IB) overhead in SDT systems and evaluates several techniques for overhead reduction. Measurements using SPEC CPU2000 show that the appropriate choice and configuration of IB translation mechanisms can significantly reduce the IB handling overhead. In addition, cross-architecture evaluation of IB handling mechanisms reveals that the most efficient implementation and configuration can be highly dependent on the implementation of the
Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications
- In Proceedings of the international symposium on Code Generation and Optimization
, 2007
"... Run-time compilation systems are challenged with the task of translating a program’s instruction stream while maintaining low overhead. While software managed code caches are utilized to amortize translation costs, they are ineffective for programs with short run times or large amounts of cold code. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Run-time compilation systems are challenged with the task of translating a program’s instruction stream while maintaining low overhead. While software managed code caches are utilized to amortize translation costs, they are ineffective for programs with short run times or large amounts of cold code. Such program characteristics are prevalent in real-life computing environments, ranging from Graphical User Interface (GUI) programs to large-scale applications such as database management systems. Persistent code caching addresses these issues. It is described and evaluated in an industry-strength dynamic binary instrumentation system – Pin. The proposed approach improves the intra-execution model of code reuse by storing and reusing translations across executions, thereby achieving inter-execution persistence. Dynamically linked programs leverage inter-application persistence by using persistent translations of library code generated by other programs. New translations discovered across executions are automatically accumulated into the persistent code caches, thereby improving performance over time. Inter-execution persistence improves the performance of GUI applications by nearly 90%, while inter-application persistence achieves a 59 % improvement. In more specialized uses, the SPEC2K INT benchmark suite experiences a 26 % improvement under dynamic binary instrumentation. Finally, a 400% speedup is achieved in translating the Oracle database in a regression testing environment. 1.
CSDL: Reusable Computing System Descriptions for Retargetable System Software
, 2000
"... In an era of rapid design of microprocessors for desktop systems, embedded systems, and handheld computing devices, the timely construction of systems software is essential. Systems software, such as assemblers, compilers, and debuggers, must be constructed before development of application software ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In an era of rapid design of microprocessors for desktop systems, embedded systems, and handheld computing devices, the timely construction of systems software is essential. Systems software, such as assemblers, compilers, and debuggers, must be constructed before development of application software for a microprocessor can commence. However, the implementation of such machine-specific applications is difficult and time consuming. Therefore, to remain competitive, it is imperative that systems software designs focus on portability to reduce implementation time and ensure rapid delivery of complete systems to the market. This dissertation presents the Computing System Description Language (CSDL) framework that addresses these rapid development requirements. We illustrate the CSDL framework by developing an instruction-set description component (RTL), an optional procedure calling convention description component (CCL), and the mechanism we use to extend extant descriptions (CSDL). RTL and its accompanying microinstruction descriptions (RTL) further the state-of-the-art in specifying semantics of machine instructions. RTL adds a new type system and abstract syntax that facilitates more accurate specification and automatic detection of errors by RTL manipulators. RTL machine descriptions are also application independent---they completely separate the specification of semantics from the application's implementation. The CCL specification language is the first work to formally describe procedure calling conventions. We demonstrate two distinct uses for CCL descriptions: code generation and fault detection. Using CCL we have built compilers that are more robust, and found and diagnosed faults in production compilers. CCL, RTL, and RTL descriptions are bound together u...
Dynamic Software Trace Caching
"... Caching basic blocks in the most frequent order greatly increases fetch bandwidth. Traditional compile-time code reordering requires a profile feedback step, which is an obstacle in itself, and is susceptible to run-time program behavior changes. On the other hand, hardware trace caches are limited ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Caching basic blocks in the most frequent order greatly increases fetch bandwidth. Traditional compile-time code reordering requires a profile feedback step, which is an obstacle in itself, and is susceptible to run-time program behavior changes. On the other hand, hardware trace caches are limited both in capacity and trace construction window size. We propose a software-managed trace cache mechanism that improves instruction fetch performance by dynamic code straightening and provides dynamic binary translation/optimization opportunities based on runtime program behavior.
Addressing the Energy Crisis in Mobile Computing with Developing Power Aware Software
, 2003
"... Reducing program power consumption by resource restricted devices has recently become a very active research area. The driving force behind this interest is the wide-spread popularity of portable computers, handheld devices, and cell phones. Consequently, there is an accelerating demand for increase ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Reducing program power consumption by resource restricted devices has recently become a very active research area. The driving force behind this interest is the wide-spread popularity of portable computers, handheld devices, and cell phones. Consequently, there is an accelerating demand for increased battery life in mobile devices. One way in which we can increase battery life is to improve battery technology and to produce devices that consume less power. Alternately, we can take a software-based approach. For example, many devices and hardware components are designed with multiple levels of operating power. Application management software (compilers, runtime, and operating systems) can adjust these levels using static and dynamic techniques to reduce program power consumption. Alternately, such systems can select to off-load computation from mobile devices to more capable, wall-powered computers.

