Results 1 - 10
of
14
Evaluating Indirect Branch Handling Mechanisms
- in Software Dynamic Translation Systems”, Int’l. Symp. on Code Generation and Optimization
, 2007
"... Software Dynamic Translation (SDT) systems are used for program instrumentation, dynamic optimization, security, intrusion detection, and many other uses. As noted by many researchers, a major source of SDT overhead is the execution of code which is needed to translate an indirect branch’s target ad ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Software Dynamic Translation (SDT) systems are used for program instrumentation, dynamic optimization, security, intrusion detection, and many other uses. As noted by many researchers, a major source of SDT overhead is the execution of code which is needed to translate an indirect branch’s target address into the address of the translated destination block. This paper discusses the sources of indirect branch (IB) overhead in SDT systems and evaluates several techniques for overhead reduction. Measurements using SPEC CPU2000 show that the appropriate choice and configuration of IB translation mechanisms can significantly reduce the IB handling overhead. In addition, cross-architecture evaluation of IB handling mechanisms reveals that the most efficient implementation and configuration can be highly dependent on the implementation of the
Tdb: a source-level debugger for dynamically translated programs
- ACM Conf. on Automated and Analysis-Driven Debugging
, 2005
"... Debugging techniques have evolved over the years in response to changes in programming languages, implementation techniques, and user needs. A new type of implementation vehicle for software has emerged that, once again, requires new debugging techniques. Software dynamic translation (SDT) has recei ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Debugging techniques have evolved over the years in response to changes in programming languages, implementation techniques, and user needs. A new type of implementation vehicle for software has emerged that, once again, requires new debugging techniques. Software dynamic translation (SDT) has received much attention due to compelling applications of the technology, including software security checking, binary translation, and dynamic optimization. Using SDT, program code changes dynamically, and thus, debugging techniques developed for statically generated code cannot be used to debug these applications. In this paper, we describe a new debug architecture for applications executing with SDT systems. The architecture provides features that create the illusion that the source program is being debugged, while allowing the SDT system to modify the executing code. We incorporated this
A Dynamic Binary Instrumentation Engine for the ARM Architecture
, 2006
"... Dynamic binary instrumentation (DBI) is a powerful technique for analyzing the runtime behavior of software. While numerous DBI frameworks have been developed for general-purpose architectures, work on DBI frameworks for embedded architectures has been fairly limited. In this paper, we describe the ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Dynamic binary instrumentation (DBI) is a powerful technique for analyzing the runtime behavior of software. While numerous DBI frameworks have been developed for general-purpose architectures, work on DBI frameworks for embedded architectures has been fairly limited. In this paper, we describe the design, implementation, and applications of the ARM version of Pin, a dynamic instrumentation system from Intel. In particular, we highlight the design decisions that are geared toward the space and processing limitations of embedded systems. Pin for ARM is publicly available and is shipped with dozens of sample plug-in instrumentation tools. It has been downloaded over 500 times since its release.
Improving the Performance of Trace-based Systems by False Loop Filtering
- In Proceedings of Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems
, 2011
"... Trace-based compilation is a promising technique for language compilers and binary translators. It offers the potential to expand the compilation scopes that have traditionally been limited by method boundaries. Detecting repeating cyclic execution paths and capturing the detected repetitions into t ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Trace-based compilation is a promising technique for language compilers and binary translators. It offers the potential to expand the compilation scopes that have traditionally been limited by method boundaries. Detecting repeating cyclic execution paths and capturing the detected repetitions into traces is a key requirement for trace selection algorithms to achieve good optimization and performance with small amounts of code. One important class of repetition detection is cyclic-path-based repetition detection, where a cyclic execution path (a path that starts and ends at the same instruction address) is detected as a repeating cyclic execution path. However, we found many cyclic paths that are not repeating cyclic execution paths, which we call false loops. A common class of false loops occurs when a method is invoked from multiple callsites.
Reducing pressure in bounded DBT code caches
- In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems
, 2008
"... Dynamic binary translators (DBT) have recently attracted much attention for embedded systems. The effective implementation of DBT in these systems is challenging due to tight constraints on memory and performance. A DBT uses a software-managed code cache to hold blocks of translated code. To minimiz ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Dynamic binary translators (DBT) have recently attracted much attention for embedded systems. The effective implementation of DBT in these systems is challenging due to tight constraints on memory and performance. A DBT uses a software-managed code cache to hold blocks of translated code. To minimize overhead, the code cache is usually large so blocks are translated once and never discarded. However, an embedded system may lack the resources for a large code cache. This constraint leads to significant slowdowns due to the retranslation of blocks prematurely discarded from a small code cache. This paper addresses the problem and shows how to impose a tight size bound on the code cache without performance loss. We show that about 70 % of the code cache is consumed by instructions that the DBT introduces for its own purposes. Based on this observation, we propose novel techniques that reduce the amount of space required by DBT-injected code, leaving more room for actual application code and improving the miss ratio. We experimentally demonstrate that a bounded code cache can have performance on-par with an unbounded one. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-purpose and a-pplication-based systems—Real-time and embedded systems; D.3.4 [Programming Languages]: Processors—Code generation, Compilers,
Generating Low-Overhead Dynamic Binary Translators Mathias Payer
, 2010
"... Dynamic (on the fly) binary translation is an important part of many software systems. In this paper we discuss how to combine efficient translation with the generation of efficient code, while providing a high-level table-driven user interface that simplifies the generation of the binary translator ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Dynamic (on the fly) binary translation is an important part of many software systems. In this paper we discuss how to combine efficient translation with the generation of efficient code, while providing a high-level table-driven user interface that simplifies the generation of the binary translator (BT). The translation actions of the BT are specified in high-level abstractions that are compiled into translation tables; these tables control the runtime program translation. This table generator allows a compact description of changes in the translated code. We use fastBT, a table-based dynamic binary translator that uses a code cache and various optimizations for indirect control transfers to illustrate the design tradeoffs in binary translators. We present an analysis of the most challenging sources of overhead and describe optimizations to further reduce these penalties. Keys to the good performance are a configurable inlining mechanism and adaptive self-modifying optimizations for indirect control transfers.
A cross-architectural interface for code cache manipulation
- In 6th Intl. Symp. on Code Generation and Optimization
, 2006
"... Software code caches help amortize the overhead of dynamic binary transformation by enabling reuse of transformed code. Since code caches contain a potentiallyaltered copy of every instruction that executes, run-time access to a code cache can be a very powerful opportunity. Unfortunately, current r ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Software code caches help amortize the overhead of dynamic binary transformation by enabling reuse of transformed code. Since code caches contain a potentiallyaltered copy of every instruction that executes, run-time access to a code cache can be a very powerful opportunity. Unfortunately, current research infrastructures lack the ability to model and direct code caching, and as a result, past code cache investigations have required access to the source code of the binary transformation system. This paper presents a code cache-aware interface to the Pin dynamic instrumentation system. While a program executes, our interface allows a user to inspect the code cache, receive callbacks when key events occur, and manipulate the code cache contents at will. We demonstrate the utility of this interface on four architectures (IA32, EM64T, IPF, XScale) and present several tools written using our API. These tools include a self-modifying code handler, a two-phase instrumentation analyzer, a code cache visualizer, and custom code cache replacement policies. We also show that tools written using our interface have comparable performance to direct, source-level implementations. Both our interface and sample open-source tools that utilize the interface have been incorporated into the standard distribution of the Pin dynamic instrumentation engine, which has been downloaded over 5,000 times in 18 months. 1.
Code Lifetime-Based Memory Reduction for Virtual Execution Environments
"... The need for adaptability in a rapidly expanding embedded systems market makes it important to design virtual execution environments (VEEs) specifically targeting embedded platforms. We believe the first step in this direction should be to replace the performance focus of traditional VEE design with ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The need for adaptability in a rapidly expanding embedded systems market makes it important to design virtual execution environments (VEEs) specifically targeting embedded platforms. We believe the first step in this direction should be to replace the performance focus of traditional VEE design with a combined memory and performance focus, given the memory constraints on embedded systems. In this work, we present techniques that reduce the large code cache sizes of VEEs by continually eliminating dead cached code as the guest application executes. We use both a time-based heuristic and an execution count-based heuristic to predict code lifetime. When we determine that the lifetime of code has ended, we remove it from the code cache. We found that at least 20 % code cache reduction can be achieved on average, without a significant performance degradation. 1.
Heterogeneous Code Cache: Using Scratchpad and Main Memory in Dynamic Binary Translators
"... Dynamic binary translation (DBT) can be used to address important issues in embedded systems. DBT systems store translated code in a software-managed code cache. Unlike general-purpose systems, embedded systems often have specialized memory resources, such as a fast scratchpad memory, that can be us ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Dynamic binary translation (DBT) can be used to address important issues in embedded systems. DBT systems store translated code in a software-managed code cache. Unlike general-purpose systems, embedded systems often have specialized memory resources, such as a fast scratchpad memory, that can be used to mitigate DBT performance overhead. This paper presents the Heterogeneous Code Cache (HCC), a code cache split among scratchpad and main memory. We explore several HCC management policies and show that, on average, an HCC outperforms a code cache allocated only to scratchpad or only to main memory. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-purpose and application-based systems—Real-time and embedded systems;
DBT Path Selection for Holistic Memory Efficiency and Performance
"... Dynamic binary translators (DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must also maintain data structures to support the code ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Dynamic binary translators (DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must also maintain data structures to support the code cache. The high memory demands make it difficult for memory-constrained embedded systems to take advantage of DBT-based tools. Previous research on DBT memory management focused on the translated code and auxiliary code only. However, we found that data structures are comparable to the code cache in size. We show that the translated code size, auxiliary code size and the data structure size interact in a complex manner, depending on the path selection (trace selection and link formation) strategy. Therefore, holistic memory efficiency (comprising translated code, auxiliary code and data structures)

