Results 1 - 10
of
11
Characterizing the Memory Behavior of Java Workloads: A Structured View and Opportunities for Optimizations
, 2000
"... This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We be ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We begin by characterizing the inherent memory behavior of the benchmarks, such as information on the breakup of heap accesses among different categories and on the hotness of references to fields and methods. We then provide detailed information about misses in the data TLB and caches, including the distribution of misses over different kinds of accesses and over different methods. In the process, we make interesting discoveries about TLB behavior and limitations of data prefetching schemes discussed in the literature in dealing with pointer-intensive Java codes. Throughout this paper, we develop a set of recommendations to computer architects and compiler writers on how to optimize computer systems and system software to run Java programs more efficiently. This paper also makes the first attempt to compare the characteristics of SPECjvm98 to those of a server-oriented benchmark, pBOB, and explain why the current set of SPECjvm98 benchmarks may not be adequate for a comprehensive and objective evaluation of JVMs and just-in-time (JIT) compilers. We discover that the fraction of accesses to array elements is quite significant, demonstrate that the number of "hot spots" in the benchmarks is small, and show that field reordering cannot yield significant performance gains. We also show that even a fairly large L2 data cache is not effective for many Java benchmarks. We observe that instructions used to prefetch data into the L2 data cache are often squashed because of high TLB miss ...
A Comparative Study of Static and Dynamic Heuristics for Inlining
, 2000
"... In this paper, we present a comparative study of static and dynamic heuristics for inlining. We introduce inlining plans as a formal representation for nested inlining decisions made by an inlining heuristic. We use a well-known approximation algorithm for the knapsack problem as a common "metaalg ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
In this paper, we present a comparative study of static and dynamic heuristics for inlining. We introduce inlining plans as a formal representation for nested inlining decisions made by an inlining heuristic. We use a well-known approximation algorithm for the knapsack problem as a common "metaalgorithm " for the static and dynamic inlining heuristics studied in this paper. We present performance results for an implementation of these inlining heuristics in the Jalape~no dynamic optimizing compiler for Java. Our performance results show that the inlining heuristics studied in this paper can lead to significant speedups in execution time (up to 1.68\Theta) even with modest limits on code size expansion (at most 10%). # pages excluding title page & bibliography = 15 # pages used by figures and tables = 4 # pages of text = 11 Dynamo '00 submission Page 0 Privileged material --- please do not distribute 1
Dynamic Optimization through the use of Automatic Runtime Specialization
, 1999
"... Profile-driven optimizations and dynamic optimization through specialization have taken optimizations to a new level. By using actual runtime data, optimizers can generate code that is specially tuned for the task at hand. However, most existing compilers that perform these optimizations require s ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Profile-driven optimizations and dynamic optimization through specialization have taken optimizations to a new level. By using actual runtime data, optimizers can generate code that is specially tuned for the task at hand. However, most existing compilers that perform these optimizations require separate test runs to gather profile information, and/or user annotations in the code. In this thesis, I describe runtime optimizations that a dynamic compiler can perform automatically --- without user annotations --- by utilizing realtime performance data. I describe the implementation of the dynamic optimizations in the framework of a Java Virtual Machine and give performance results.
A Heuristic Search Algorithm Based on Unified Transformation Framework
- In ICPPW ’05: Proceedings of the 2005 International Conference on Parallel Processing Workshops (ICPPW’05
, 2005
"... Modern compilers have limited ability to exploit the performance improvement potential of complex transformation compositions. This is due to the ad-hoc nature of different transformations. Various frameworks have been proposed to provide a unified representation of different transformations, among ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Modern compilers have limited ability to exploit the performance improvement potential of complex transformation compositions. This is due to the ad-hoc nature of different transformations. Various frameworks have been proposed to provide a unified representation of different transformations, among them is Pugh's Unified Transformation Framework (UTF)[10]. It presents a unified and systematic representation of iteration reordering transformations and their arbitrary combination, which results in a large and complex optimisation space for a compiler to explore. This paper presents a heuristic search algorithm capable of efficiently locating good program optimisations within such a space. Preliminary experimental results on Java show that it can achieve an average speedup of 1.14 on Linux+Celeron and 1.10 on Windows+PentiumPro, and more than 75% of the maximum performance available can be obtained within 20 evaluations or less.
Adaptive Java Optimisation Using Instance-Based Learning
, 2004
"... This paper describes a portable, machine learning-based approach to Java optimisation. This approach uses an instance-based learning scheme to select good transformations drawn from Pugh's Unified Transformation Framework[11]. This approach was implemented and applied to a number of numerical Java b ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper describes a portable, machine learning-based approach to Java optimisation. This approach uses an instance-based learning scheme to select good transformations drawn from Pugh's Unified Transformation Framework[11]. This approach was implemented and applied to a number of numerical Java benchmarks on two platforms. Using this scheme, we are able to gain over 70% of the performance improvement found when using an exhaustive iterative search of the best compiler optimisations. Thus we have a scheme that gives a high level of portable performance without any excessive compilations.
Systematic Search within an Optimisation Space Based on Unified Transformation Framework
, 2006
"... Modern compilers have limited ability to exploit the performance improvement potential of complex transformation compositions. This is due to the ad-hoc nature of di#erent transformations. Various frameworks have been proposed to provide a unified representation of di#erent transformations, among th ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Modern compilers have limited ability to exploit the performance improvement potential of complex transformation compositions. This is due to the ad-hoc nature of di#erent transformations. Various frameworks have been proposed to provide a unified representation of di#erent transformations, among them is Pugh's Unified Transformation Framework (UTF) (Kelly and Pugh (1993)). It presents a unified and systematic representation of iteration reordering transformations and their arbitrary combination, which results in a large and complex optimisation space for a compiler to explore. This paper presents a heuristic search algorithm capable of e#ciently locating good program optimisations within such a space. Preliminary experimental results on Java show that it can achieve an average speedup of 1.14 on Linux+Celeron and 1.10 on Windows+PentiumPro, and more than 75% of the maximum performance available can be obtained within 20 evaluations or less.
Compile Time Elimination of Null- and Bounds-Checks
- IN 9TH WORKSHOP ON COMPILERS FOR PARALLEL COMPUTERS
, 2001
"... SafeTSA is a new type safe intermediate representation for mobile code based on static single assignment form. We developed SafeTSA as an alternative to the Java Virtual Machine. Programs in ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
SafeTSA is a new type safe intermediate representation for mobile code based on static single assignment form. We developed SafeTSA as an alternative to the Java Virtual Machine. Programs in
A Type-Safe Mobile-Code Representation Aimed at Supporting Dynamic Optimization At The Target Site
- 2001 ACM Sigplan Conference on Programming Language Design and Implementation (PLDI 2001). Snowbird
"... We introduce SafeTSA, a type-safe mobile code representation based on static single assignment form. We are developing SafeTSA as an alternative to the Java Virtual Machine, over which it has several advantages: (1) SafeTSA is better suited as input to optimizing dynamic code generators and allows C ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce SafeTSA, a type-safe mobile code representation based on static single assignment form. We are developing SafeTSA as an alternative to the Java Virtual Machine, over which it has several advantages: (1) SafeTSA is better suited as input to optimizing dynamic code generators and allows CSE to be performed at the code producer's site. (2) SafeTSA provides incorruptible referential integrity and uses "type separation" to achieve intrinsic type safety. These properties reduce the code verification effort at the code consumer's site considerably.
Increasing Java Performance in Memory-Constrained Environments Using Explicit Memory Deallocation
- International Workshop on Mobility Aware Computing
, 2005
"... As more and more powerful Java implementations begin to arrive to mobile devices, memory footprint problems are again encountered. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
As more and more powerful Java implementations begin to arrive to mobile devices, memory footprint problems are again encountered.
Deferred Gratification: Engineering for High Performance Garbage Collection from the Get Go ∗
"... Implementing a new programming language system is a daunting task. A common trap is to punt on the design and engineering of exact garbage collection and instead opt for reference counting or conservative garbage collection (GC). For example, AppleScript TM, Perl, Python, and PHP implementers chose ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Implementing a new programming language system is a daunting task. A common trap is to punt on the design and engineering of exact garbage collection and instead opt for reference counting or conservative garbage collection (GC). For example, AppleScript TM, Perl, Python, and PHP implementers chose reference counting (RC) and Ruby chose conservative GC. Although easier to get working, reference counting has terrible performance and conservative GC is inflexible and performs poorly when allocation rates are high. However, high performance GC is central to performance for managed languages and only becoming more critical due to relatively lower memory bandwidth and higher memory latency of modern architectures. Unfortunately, retrofitting support for high performance collectors later is a formidable software engineering task due to their exact nature. Whether they realize it or not, implementers

