Results 1 - 10
of
68
Pin: building customized program analysis tools with dynamic instrumentation
- In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
, 2005
"... Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and eff ..."
Abstract
-
Cited by 416 (20 self)
- Add to MetaCart
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin’s rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application’s original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin’s versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium R ○ , and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging-code inspections and walk-throughs,
The Jalapeño Dynamic Optimizing Compiler for Java
, 1999
"... The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and ..."
Abstract
-
Cited by 159 (28 self)
- Add to MetaCart
The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and the implementation results that we have obtained thus far. To the best of our knowledge, this is the first dynamic optimizing compiler for Java that is being used in a JVM with a compile-only approach to program execution.
Adaptive Optimization in the Jalapeno JVM
- In ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA
, 2000
"... (*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06*9E,'Y58(*03C 1622 *'\,20/2XD3Q#$,U-0/269EU,/52,X"58QF,0'58,+ I,2/2-K58X^528-3L2T6252,_0/252/,5 ..."
Abstract
-
Cited by 149 (10 self)
- Add to MetaCart
(*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06*9E,'Y58(*03C 1622 *'\,20/2XD3Q#$,U-0/269EU,/52,X"58QF,0'58,+ I,2/2-K58X^528-3L2T6252,_0/252/,58('4-*0'2,Y 0C#$,058Z#>58,0@=`58a02T/2*(*C/,':b(/,058c+ \",25C0d@"3,152058[#;58!*03e0/252,/58( 5805f8(""52<00"58>b(3589$3,3*"*58QF058C-02,;"(3T Y2520'58258/,03@20'Q"3+ ] D,Q"...
`C and tcc: A Language and Compiler for Dynamic Code Generation
- ACM Transactions on Programming Languages and Systems
, 1999
"... This paper makes the following contributions: ---It describes the `C language, and motivates the design of the language ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
This paper makes the following contributions: ---It describes the `C language, and motivates the design of the language
CARS: A new code generation framework for clustered ILP processors
- In HPCA
, 2001
"... Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approache ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approaches need additional re-scheduling phases because they often do not impose finite resource constraints in all phases of code generation. These phase-ordered solutions have several drawbacks, resulting in the generation of poor performance code. Moreover, the iterative/back-tracking algorithms used in some of these schemes have large running times. In this paper we present CARS, a code generation framework for Clustered ILP processors, which combines the cluster assignment, register allocation, and instruction scheduling phases into a single code generation phase, thereby eliminating the problems associated with phase-ordered solutions. The CARS algorithm explicitly takes into account all the resource constraints at each cluster scheduling step to reduce spilling and to avoid iterative re-scheduling steps. We also present a new on-the-fly register allocation scheme developed for CARS. We describe an implementation of the proposed code generation framework and the results of a performance evaluation study using the SPEC95/2000 and MediaBench benchmarks.
Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless
- In Proc. 2001 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications
, 2001
"... Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. Thi ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. This paper argues that with proper implementation techniques, Java interfaces need not be a source of significant performance degradation. We present an efficient interface method dispatch mechanism, associating a fixed-sized interface method table (IMT) with each class that implements an interface. Interface method signatures hash to an IMT slot, with any hashing collisions handled by custom-generated conflict resolution stubs. The dispatch mechanism is efficient in both time and space. Furthermore, with static analysis and online profile data, an optimizing compiler can inline the dominant target(s) of any frequently executed interface call. Micro-benchmark results demonstrate that the expected cost of an interface method call dispatched via an IMT is comparable to the cost of a virtual method call. Experimental evaluation of a number of interface dispatch mechanisms on a suite of larger applications demonstrates that, even for applications that make only moderate use of interface methods, the choice of interface dispatching mechanism can significantly impact overall performance. Fortunately, several mechanisms provide good performance at a modest space cost.
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
A model-based framework: an approach for profit-driven optimization
- In Third Annual IEEE/ACM Interational Conference on Code Generation and Optimization
, 2005
"... Although optimizations have been applied for a number of years to improve the performance of software, problems that have been long-standing remain, which include knowing what optimizations to apply and how to apply them. To systematically tackle these problems, we need to understand the properties ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Although optimizations have been applied for a number of years to improve the performance of software, problems that have been long-standing remain, which include knowing what optimizations to apply and how to apply them. To systematically tackle these problems, we need to understand the properties of optimizations. In our current research, we are investigating the profitability property, which is useful for determining the benefit of applying an optimization. Due to the high cost of applying optimizations and then experimentally evaluating their profitability, we use an analytic model framework for predicting the profitability of optimizations. In this paper, we target scalar optimizations, and in particular, describe framework instances for Partial Redundancy Elimination (PRE) and Loop Invariant Code Motion (LICM). We implemented the framework for both optimizations and compare profitdriven PRE and LICM with a heuristic-driven approach. Our experiments demonstrate that a model-based approach is effective and efficient in that it can accurately predict the profitability of optimizations with low overhead. By predicting the profitability using models, we can selectively apply optimizations. The model-based approach does not require tuning of parameters used in heuristic approaches and works well across different code contexts and optimizations. 1.
Register allocation via coloring of chordal graphs
- In Proceedings of APLAS’05, Asian Symposium on Programming Languages and Systems
, 2005
"... Abstract. We present a simple algorithm for register allocation which is competitive with the iterated register coalescing algorithm of George and Appel. We base our algorithm on the observation that 95 % of the methods in the Java 1.5 library have chordal interference graphs when compiled with the ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract. We present a simple algorithm for register allocation which is competitive with the iterated register coalescing algorithm of George and Appel. We base our algorithm on the observation that 95 % of the methods in the Java 1.5 library have chordal interference graphs when compiled with the JoeQ compiler. A greedy algorithm can optimally color a chordal graph in time linear in the number of edges, and we can easily add powerful heuristics for spilling and coalescing. Our experiments show that the new algorithm produces better results than iterated register coalescing for settings with few registers and comparable results for settings with many registers. 1
Dynamic Optimization through the use of Automatic Runtime Specialization
, 1999
"... Profile-driven optimizations and dynamic optimization through specialization have taken optimizations to a new level. By using actual runtime data, optimizers can generate code that is specially tuned for the task at hand. However, most existing compilers that perform these optimizations require s ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Profile-driven optimizations and dynamic optimization through specialization have taken optimizations to a new level. By using actual runtime data, optimizers can generate code that is specially tuned for the task at hand. However, most existing compilers that perform these optimizations require separate test runs to gather profile information, and/or user annotations in the code. In this thesis, I describe runtime optimizations that a dynamic compiler can perform automatically --- without user annotations --- by utilizing realtime performance data. I describe the implementation of the dynamic optimizations in the framework of a Java Virtual Machine and give performance results.

