Results 1 - 10
of
149
Pin: building customized program analysis tools with dynamic instrumentation
- IN PLDI ’05: PROCEEDINGS OF THE 2005 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 2005
"... Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and eff ..."
Abstract
-
Cited by 991 (35 self)
- Add to MetaCart
(Show Context)
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin’s rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application’s original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin’s versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium R ○ , and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.
The Jalapeño Dynamic Optimizing Compiler for Java
, 1999
"... The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and ..."
Abstract
-
Cited by 188 (30 self)
- Add to MetaCart
The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and the implementation results that we have obtained thus far. To the best of our knowledge, this is the first dynamic optimizing compiler for Java that is being used in a JVM with a compile-only approach to program execution.
Adaptive Optimization in the Jalapeno JVM
- In ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA
, 2000
"... (*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06 ..."
Abstract
-
Cited by 181 (14 self)
- Add to MetaCart
(*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06*9E,'Y58(*03C 1622 *'\,20/2XD3Q#$,U-0/269EU,/52,X"58QF,0'58,+ I,2/2-K58X^528-3L2T6252,_0/252/,58('4-*0'2,Y 0C#$,058Z#>58,0@=`58a02T/2*(*C/,':b(/,058c+ \",25C0d@"3,152058[#;58!*03e0/252,/58( 5805f8(""52<00"58>b(3589$3,3*"*58QF058C-02,;"(3T Y2520'58258/,03@20'Q"3+ ] D,Q"...
Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems
, 2006
"... In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees v ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme. Scratch-pad allocation methods primarily are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees, just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. We propose a dynamic allocation methodology for global and stack data and
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
`C and tcc: A Language and Compiler for Dynamic Code Generation
- ACM Transactions on Programming Languages and Systems
, 1999
"... This paper makes the following contributions: ---It describes the `C language, and motivates the design of the language ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
This paper makes the following contributions: ---It describes the `C language, and motivates the design of the language
Defeating returnoriented rootkits with return-less kernels
- in Proceedings of the 5th European conference on Computer systems
"... Targeting the operating system (OS) kernels, kernel rootkits pose a formidable threat to computer systems and their users. Recent efforts have made significant progress in blocking them from injecting malicious code into the OS kernel for execution. Unfortunately, they cannot block the emerging so-c ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
(Show Context)
Targeting the operating system (OS) kernels, kernel rootkits pose a formidable threat to computer systems and their users. Recent efforts have made significant progress in blocking them from injecting malicious code into the OS kernel for execution. Unfortunately, they cannot block the emerging so-called return-oriented rootkits (RORs). Without the need of injecting their own malicious code, these rootkits can dis-cover and chain together “return-oriented gadgets ” (that con-sist of only legitimate kernel code) for rootkit computation. In this paper, we propose a compiler-based approach to defeat these return-oriented rootkits. Our approach recog-nizes the hallmark of return-oriented rootkits, i.e., the ret instruction, and accordingly aims to completely remove them in a running OS kernel. Specifically, one key technique named return indirection is to replace the return address in a stack frame into a return index and disallow a ROR from us-ing their own return addresses to locate and assemble return-oriented gadgets. Further, to prevent legitimate instructions that happen to contain return opcodes from being misused, we also propose two other techniques, register allocation and peephole optimization, to avoid introducing them in the first place. We have developed a LLVM-based prototype and used it to generate a return-less FreeBSD kernel. Our evalu-ation results indicate that the proposed approach is generic, effective, and can be implemented on commodity hardware with a low performance overhead.
CARS: A new code generation framework for clustered ILP processors
- In HPCA
, 2001
"... Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approache ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
(Show Context)
Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approaches need additional re-scheduling phases because they often do not impose finite resource constraints in all phases of code generation. These phase-ordered solutions have several drawbacks, resulting in the generation of poor performance code. Moreover, the iterative/back-tracking algorithms used in some of these schemes have large running times. In this paper we present CARS, a code generation framework for Clustered ILP processors, which combines the cluster assignment, register allocation, and instruction scheduling phases into a single code generation phase, thereby eliminating the problems associated with phase-ordered solutions. The CARS algorithm explicitly takes into account all the resource constraints at each cluster scheduling step to reduce spilling and to avoid iterative re-scheduling steps. We also present a new on-the-fly register allocation scheme developed for CARS. We describe an implementation of the proposed code generation framework and the results of a performance evaluation study using the SPEC95/2000 and MediaBench benchmarks.
Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless
- In Proc. 2001 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications
, 2001
"... Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. Thi ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
(Show Context)
Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. This paper argues that with proper implementation techniques, Java interfaces need not be a source of significant performance degradation. We present an efficient interface method dispatch mechanism, associating a fixed-sized interface method table (IMT) with each class that implements an interface. Interface method signatures hash to an IMT slot, with any hashing collisions handled by custom-generated conflict resolution stubs. The dispatch mechanism is efficient in both time and space. Furthermore, with static analysis and online profile data, an optimizing compiler can inline the dominant target(s) of any frequently executed interface call. Micro-benchmark results demonstrate that the expected cost of an interface method call dispatched via an IMT is comparable to the cost of a virtual method call. Experimental evaluation of a number of interface dispatch mechanisms on a suite of larger applications demonstrates that, even for applications that make only moderate use of interface methods, the choice of interface dispatching mechanism can significantly impact overall performance. Fortunately, several mechanisms provide good performance at a modest space cost.
MaJIC: Compiling MATLAB for Speed and Responsiveness
- In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI-02
, 2002
"... This paper presents and evaluates techniques to improve the execution performance of MATLAB. Previous efforts con-centrated on source to source translation and batch compila-tion;MaJIC provides an interactive frontend that looks like MATLAB and compiles/optimizes code behind the scenes in real time, ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
(Show Context)
This paper presents and evaluates techniques to improve the execution performance of MATLAB. Previous efforts con-centrated on source to source translation and batch compila-tion;MaJIC provides an interactive frontend that looks like MATLAB and compiles/optimizes code behind the scenes in real time, employing a combination of just-in-time and speculative ahead-of-time compilation. Performance results show that the proper mixture of these two techniques can yield near-zero response time as well as performance gains previously achieved only by batch compilers.