Results 1 - 10
of
19
Efficient, Transparent and Comprehensive Runtime Code Manipulation
, 2004
"... This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every i ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems
- IN INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION
, 2004
"... Dynamic optimization systems store optimized or translated code in a software-managed code cache in order to maximize reuse of transformed code. Code caches store superblocks that are not fixed in size, may contain links to other superblocks, and carry a high replacement overhead. These additional c ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Dynamic optimization systems store optimized or translated code in a software-managed code cache in order to maximize reuse of transformed code. Code caches store superblocks that are not fixed in size, may contain links to other superblocks, and carry a high replacement overhead. These additional constraints reduce the effectiveness of conventional hardware-based cache management policies. In this paper, we explore code cache management policies that evict large blocks of code from the code cache, thus avoiding the bookkeeping overhead of managing single cache blocks. Through a combined simulation and analytical study of cache management overheads, we show that employing a medium-grained FIFO eviction policy results in an effective balance of cache management complexity and cache miss rates. Under high cache pressure the choice of medium granularity translates into a significant reduction in overall execution time versus both coarse and fine granularities.
Maintaining consistency and bounding capacity of software code caches
- Int’l. Symp. on Code Generation and Optimization
, 2005
"... Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emul ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emulation and meeting or exceeding native performance in dynamic optimizers. One disadvantage of caching, memory expansion, can sometimes be ignored when executing a single application. However, as optimizers and translators are applied more and more in production systems, the memory expansion from running multiple applications simultaneously becomes problematic. A second drawback to caching is the added requirement of maintaining consistency between the code cache and the original code. On architectures like IA-32 that do not require explicit application actions when modifying code, detecting code changes is challenging. Again, consistency can be ignored for certain sets of applications, but as caching systems scale up to executing large, modern, complex programs, consistency becomes critical. This paper presents efficient schemes for keeping a software code cache consistent and for dynamically bounding code cache size to match the current working set of the application. These schemes are evaluated in the DynamoRIO runtime code manipulation system, and operate on stock hardware in the presence of multiple threads and dynamic behavior, including dynamically-loaded, generated, and even modified code. 1
Generational Cache Management of Code Traces in Dynamic Optimization Systems
, 2003
"... A dynamic optimizer is a runtime software system that groups a program's instruction sequences into traces, optimizes those traces, stores the optimized traces in a softwarebased code cache, and then executes the optimized code in the code cache. To maximize performance, the vast majority of the pro ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
A dynamic optimizer is a runtime software system that groups a program's instruction sequences into traces, optimizes those traces, stores the optimized traces in a softwarebased code cache, and then executes the optimized code in the code cache. To maximize performance, the vast majority of the program's execution should occur in the code cache and not in the different aspects of the dynamic optimization system. In the past, designers of dynamic optimizers have used the SPEC2000 benchmark suite to justify their use of simple code cache management schemes. In this paper, we show that the problem and importance of code cache management changes dramatically as we move from SPEC2000, with its relatively small number of dynamically generated code traces, to large interactive Windows applications. We also propose and evaluate a new cache management algorithm based on generational code caches that results in an average miss rate reduction of 18% over a unified cache, which translates into 19% fewer instructions spent in the dynamic optimizer. The algorithm categorizes code traces based on their expected lifetimes and groups traces with similar lifetimes together in separate storage areas. Using this algorithm, short-lived code traces can easily be removed from a code cache without introducing fragmentation and without suffering the performance penalties associated with evicting long-lived code traces.
Dynamic Binary Translation for Accumulator-Oriented Architectures
, 2003
"... A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction pipeline and issue logic. To support conventional program binaries, a source instruction set (Compaq Alpha in our study) is dynamically translated to the target accumulator instruction set. The binary translator identifies chains of inter-instruction dependences and assigns them to dependence-carrying accumulators. Because the underlying superscalar microarchitecture is capable of dynamic instruction scheduling, the binary translation system does not perform aggressive optimizations or re-schedule code; this significantly reduces binary translation overhead.
Managing bounded code caches in dynamic binary optimization systems
- ACM Trans. on Architecture and Code Optimization
"... Dynamic binary optimizers store altered copies of original program instructions in softwaremanaged code caches in order to maximize reuse of transformed code. Code caches store code blocks that may vary in size, reference other code blocks, and carry a high replacement overhead. These unique constra ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Dynamic binary optimizers store altered copies of original program instructions in softwaremanaged code caches in order to maximize reuse of transformed code. Code caches store code blocks that may vary in size, reference other code blocks, and carry a high replacement overhead. These unique constraints reduce the effectiveness of conventional cache management policies. Our work directly addresses these unique constraints and presents several contributions to the code-cache management problem. First, we show that evicting more than the minimum number of code blocks from the code cache results in less run-time overhead than the existing alternatives. Such granular evictions reduce overall execution time, as the fixed costs of invoking the eviction mechanism are amortized across multiple cache insertions. Second, a study of the ideal lifetimes of dynamically generated code blocks illustrates the benefit of a replacement algorithm based on a generational heuristic. We describe and evaluate a generational approach to code cache management that makes it easy to identify long-lived code blocks and simultaneously avoid any fragmentation because of the eviction of short-lived blocks. Finally, we present results from an implementation of our generational approach in the DynamoRIO framework and illustrate that, as dynamic optimization systems become more prevalent, effective code cache-management policies will be essential for reliable, scalable performance of modern applications.
Adaptive Code Unloading for Resource-Constrained JVMs
- In ACM Conference on Languages, Compilers, and Tools (LCTES
, 2004
"... Compile-only JVMs for resource-constrained embedded systems have the potential for using device resources more efficiently than interpreter-only systems since compilers can produce significantly higher quality code and code can be stored and reused for future invocations. However, this additional st ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Compile-only JVMs for resource-constrained embedded systems have the potential for using device resources more efficiently than interpreter-only systems since compilers can produce significantly higher quality code and code can be stored and reused for future invocations. However, this additional storage requirement for reuse of native code bodies, introduces memory overhead not imposed in interpreter-based systems. In this paper, we present...
Planning for code buffer management in distributed virtual execution environments
- In: Conference on Virtual Execution Environments
, 2005
"... Virtual execution environments have become increasingly useful in system implementation, with dynamic translation techniques being an important component for performance-critical systems. Many devices have exceptionally tight performance and memory constraints (e.g., smart cards and sensors in distr ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Virtual execution environments have become increasingly useful in system implementation, with dynamic translation techniques being an important component for performance-critical systems. Many devices have exceptionally tight performance and memory constraints (e.g., smart cards and sensors in distributed systems), which require effective resource management. One approach to manage code memory is to download code partitions on-demand from a server and to cache the partitions in the resource-constrained device (client). However, due to the high cost of downloading code and re-translation, it is critical to intelligently manage the code buffer to minimize the overhead of code buffer misses. Yet, intelligent buffer management on the tightly constrained client can be too expensive. In this paper, we propose to move code buffer management to the server, where sophisticated schemes can be employed. We describe two schemes that use profiling information to direct the client in caching code partitions. One scheme is designed for workloads with stable run-time behavior, while the other scheme adapts its decisions for workloads with unstable behaviors. We evaluate and compare our schemes and show they perform well, compared to other approaches, with the adaptive scheme having the best performance overall.
Compact binaries with code compression in a software dynamic translator
- Design Automation and Test in Europe Conference
, 2004
"... Embedded software is becoming more flexible and adaptable, which presents new challenges for management of highly constrained system resources. Software dynamic translation (SDT) has been used to enable software malleability at the instruction level for dynamic code optimizers, security checkers, an ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Embedded software is becoming more flexible and adaptable, which presents new challenges for management of highly constrained system resources. Software dynamic translation (SDT) has been used to enable software malleability at the instruction level for dynamic code optimizers, security checkers, and binary translators. This paper studies the feasibility of using SDT to manage program code storage in embedded systems. We explore to what extent code compression can be incorporated in a software infrastructure to reduce program storage requirements, while minimally impacting run-time performance and memory resources. We describe two approaches for code compression, called full and partial image compression, and evaluate their compression ratios and performance in a software dynamic translation system. We demonstrate that code decompression is indeed feasible in a SDT. 1.
SoftCache: A Technique for Power and Area Reduction in Embedded Systems
- In Fourth Workshop on Feedback-Directed and Dynamic Optimization (FDDO
, 2003
"... Explicitly software managed cache systems are postulated as a solution for power considerations in computing devices. The savings expected in a SoftCache lies in the removal of tag storage, associativity logic, comparators, and other hardware dedicated to memory hierarchies. The penalty lies in high ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Explicitly software managed cache systems are postulated as a solution for power considerations in computing devices. The savings expected in a SoftCache lies in the removal of tag storage, associativity logic, comparators, and other hardware dedicated to memory hierarchies. The penalty lies in high cache-miss cost and additional instructions required to effect a cache model. In this paper, we characterize SoftCaches by placing them in the overall computing landscape, analyzing the energy and space tradeoffs. We present results that indicate a SoftCache saves power and space over hardware caches. Based on the TSMC 0.25 m process from MOSIS, we use schematic and layout representations of hardware and SoftCache models for comparison. Accounting for additional instructions executed and simplification of logic, we examine high SoftCache miss cost in relation to the overall system. For a 256KB "mode" change every 1.45 hours, the SoftCache exhibits 1% application slowdown for energy savings of 30% or more in a low-power device such as the SA-110 microprocessor used in PocketPC platforms.

