Results 1 - 10
of
20
Characterizing the Memory Behavior of Java Workloads: A Structured View and Opportunities for Optimizations
, 2000
"... This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We be ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We begin by characterizing the inherent memory behavior of the benchmarks, such as information on the breakup of heap accesses among different categories and on the hotness of references to fields and methods. We then provide detailed information about misses in the data TLB and caches, including the distribution of misses over different kinds of accesses and over different methods. In the process, we make interesting discoveries about TLB behavior and limitations of data prefetching schemes discussed in the literature in dealing with pointer-intensive Java codes. Throughout this paper, we develop a set of recommendations to computer architects and compiler writers on how to optimize computer systems and system software to run Java programs more efficiently. This paper also makes the first attempt to compare the characteristics of SPECjvm98 to those of a server-oriented benchmark, pBOB, and explain why the current set of SPECjvm98 benchmarks may not be adequate for a comprehensive and objective evaluation of JVMs and just-in-time (JIT) compilers. We discover that the fraction of accesses to array elements is quite significant, demonstrate that the number of "hot spots" in the benchmarks is small, and show that field reordering cannot yield significant performance gains. We also show that even a fairly large L2 data cache is not effective for many Java benchmarks. We observe that instructions used to prefetch data into the L2 data cache are often squashed because of high TLB miss ...
Practical, transparent operating system support for superpages
- SIGOPS Oper. Syst. Rev
, 2002
"... Most general-purpose processors provide support for memory pages of large sizes, called superpages. Superpages enable each entry in the translation lookaside buffer (TLB) to map a large physical memory region into a virtual address space. This dramatically increases TLB coverage, reduces TLB misses, ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Most general-purpose processors provide support for memory pages of large sizes, called superpages. Superpages enable each entry in the translation lookaside buffer (TLB) to map a large physical memory region into a virtual address space. This dramatically increases TLB coverage, reduces TLB misses, and promises performance improvements for many applications. However, supporting superpages poses several challenges to the operating system, in terms of superpage allocation and promotion tradeoffs, fragmentation control, etc. We analyze these issues, and propose the design of an effective superpage management system. We implement it in FreeBSD on the Alpha CPU, and evaluate it on real workloads and benchmarks. We obtain substantial performance benefits, often exceeding 30%; these benefits are sustained even under stressful workload scenarios. 1
Controlling garbage collection and heap growth to reduce execution time of Java applications
- In ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA’01
, 2001
"... ABSTRACT In systems that support garbage collection, a tension exists between collecting garbage too frequently and not collecting garbage frequently enough. Garbage collection that occurs too frequently may introduce unnecessary overheads at the risk of not collecting much garbage during each cycle ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
ABSTRACT In systems that support garbage collection, a tension exists between collecting garbage too frequently and not collecting garbage frequently enough. Garbage collection that occurs too frequently may introduce unnecessary overheads at the risk of not collecting much garbage during each cycle. On the other hand, collecting garbage too infrequently can result in applications that execute with a large amount of virtual memory (i.e., with a large footprint) and suffer from increased execution times due to paging. In this paper, we use a large collection of JavaTMapplications and the highly tuned and widely used Boehm-DemersWeiser (BDW) conservative mark-and-sweep garbage collector to experimentally examine the extent to which the frequency of garbage collection impacts an application's execution time, footprint, and pause times. We use these results to devise some guidelines for controlling garbage collection and heap growth in a conservative garbage collector in order to minimize application execution times. Then we describe new strategies for controlling garbage collection and heap growth that impact not only the frequency with which garbage collection occurs but also the points at which garbage collection occurs. Experimental results demonstrate that, when compared with the existing approach used in the standard BDW collector, our new strategy can significantly reduce application execution times. Our goal is to obtain a better understanding of how to control garbage collection and heap growth for an individual application executing in isolation. These results can be applied in a number of high-performance computing and server environments, in addition to some single-user environments. This work should also provide insights into how
Dynamic SimpleScalar: Simulating Java Virtual Machines
, 2003
"... Current user-mode machine simulators typically do not support simulation of dynamic compilation, threads, or garbage collection, all of which Java Virtual Machines (JVMs) require. In this paper, we describe, evaluate, and validate Dynamic SimpleScalar (DSS). DSS is a tool that simulates Java program ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
Current user-mode machine simulators typically do not support simulation of dynamic compilation, threads, or garbage collection, all of which Java Virtual Machines (JVMs) require. In this paper, we describe, evaluate, and validate Dynamic SimpleScalar (DSS). DSS is a tool that simulates Java programs running on a JVM, using just-in-time compilation, executing on a simulated multi-way issue, out-of-order execution superscalar processor with a sophisticated memory system. We describe the implementation of the minimal support necessary for simulating a JVM in SimpleScalar, including signals, thread scheduling, synchronization, and dynamic code generation, all required by a JVM. We validate our simulator using IBM Research's Jikes RVM, a state-of-the-art JVM that runs Submitting to the First Annual IEEE/ACM International Symposium On Code Generation and Optimization.
Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times
- In Proc. of the 17 th ACM conference on Objectoriented programming, systems, languages, and applications (OOPSLA 2002
, 2002
"... Growing gap between processor and memory speeds Need for optimization strategies that improve data locality and thereby improve memory behavior Major challenge: techniques suitable for pointerintensive applications Java: pointer-intensive applications with dynamic memory allocation General idea 1 st ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Growing gap between processor and memory speeds Need for optimization strategies that improve data locality and thereby improve memory behavior Major challenge: techniques suitable for pointerintensive applications Java: pointer-intensive applications with dynamic memory allocation General idea 1 st idea: Co-locating objects together at allocation time 2 nd idea: Memory-friendly traversal algorithm during garbage collection two natural points during program execution when the placement of data can be improvedCreating Locality at Allocation Time (1) Prolific types
Bosschere. 64-bit versus 32-bit virtual machines for Java
- Software: Practice and Experience
"... The Java language is popular because of its platform independence, making it useful in a lot of technologies ranging from embedded devices to high-performance systems. The platform-independent property of Java, which is visible at the Java bytecode level, is only made possible thanks to the availabi ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The Java language is popular because of its platform independence, making it useful in a lot of technologies ranging from embedded devices to high-performance systems. The platform-independent property of Java, which is visible at the Java bytecode level, is only made possible thanks to the availability of a Virtual Machine (VM), which needs to be designed specifically for each underlying hardware platform. More specifically, the same Java bytecode should run properly on a 32-bit or a 64-bit VM. In this paper, we compare the behavioral characteristics of 32-bit and 64-bit VMs using a large set of Java benchmarks. This is done using the Jikes Research VM as well as the IBM JDK 1.4.0 production VM on a PowerPC-based IBM machine. By running the PowerPC machine in both 32-bit and 64-bit mode we are able to compare 32-bit and 64-bit VMs. We conclude that the space an object takes in the heap in 64-bit mode is 39.3% larger on average than in 32-bit mode. We identify three reasons for this: (i) the larger pointer size, (ii) the increased header and (iii) the increased alignment. The minimally required heap size is 51.1 % larger on average in 64-bit than in 32-bit mode. From our experimental setup using hardware performance monitors, we observe that 64-bit computing typically results in a significantly larger number of data cache misses at all levels of the memory hierarchy. In addition, we observe that when a sufficiently large heap is available, the IBM JDK 1.4.0 VM is 1.7 % slower on average in 64-bit mode than in 32-bit mode. Copyright c ○ 2005
Analyzing the Performance of Memory Management in RTSJ
- In The 5th IEEE International Symposim on Object-oriented Realtime Distributed Computing (Crystal
, 2002
"... The memory models used in the Real-Time Specification for Java (RTSJ) can incur high amounts of overhead; It is possible to reduce this overhead by taking advantages of hardware features. This paper provides an indepth analytical investigation of the overhead of write barriers in RTSJ VMs, and descr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The memory models used in the Real-Time Specification for Java (RTSJ) can incur high amounts of overhead; It is possible to reduce this overhead by taking advantages of hardware features. This paper provides an indepth analytical investigation of the overhead of write barriers in RTSJ VMs, and describes and analyzes some solutions to reduce the overhead of write barriers.
Multiple Page Size Support in the Linux Kernel
- In Proceedings of Ottawa Linux Symposium
, 2002
"... The Linux kernel currently supports a single user space page size, usually the minimum dictated by the architecture. This paper describes the ongoing modifications to the Linux kernel to allow applications to vary the size of pages used to map their address spaces and to reap the performance benefit ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The Linux kernel currently supports a single user space page size, usually the minimum dictated by the architecture. This paper describes the ongoing modifications to the Linux kernel to allow applications to vary the size of pages used to map their address spaces and to reap the performance benefits associated with the use of large pages.
Cache Performance in Java Virtual Machines: A Study of Constituent Phases
- In IEEE International Workshop on Workload Characterization
, 2002
"... This paper studies the level 1 cache performance of Java programs by analyzing memory reference traces of the SPECjvm98 applications executed by the Latte Java Virtual Machine. We study in detail Java programs' cache performance of different access types in three JVM phases, under two execution mode ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper studies the level 1 cache performance of Java programs by analyzing memory reference traces of the SPECjvm98 applications executed by the Latte Java Virtual Machine. We study in detail Java programs' cache performance of different access types in three JVM phases, under two execution modes, using three cache configurations and two application data sets. We observe that the poor data cache performance in the JIT execution mode is caused by code installation, when the data write miss rate in the execution engine can be as high as 70%. In addition, code installation also deteriorates instruction cache performance during execution of translated code. High cache miss rate in garbage collection is mainly caused by large working set and pointer chasing of the garbage collector. A larger data cache works better on eliminating data cache read misses than write misses, and is more efficient on improving cache performance in the execution engine than in the garbage collection. As application data set increases in the JIT execution mode, instruction cache and data cache write miss rates of the execution engine decrease, while data cache read miss rate of the execution engine increases. On the other hand, impact of varying data set on cache performance is not as pronounced in the interpreted mode as in the JIT mode.
Comparison of Memory System Behavior in Java and Non-Java Commercial Workloads
- In Proceedings of the Workshop on Computer Architecture Evaluation using Commercial Workloads
, 2002
"... In this paper, we compare the memory system behavior of Perl and Java versions of SPECweb99. We find that the memory behaviors of these two versions of SPECweb99 are very different. The Java version incurs 213% more cache misses than the Perl version for a 1 MB second level cache and 179% more misse ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we compare the memory system behavior of Perl and Java versions of SPECweb99. We find that the memory behaviors of these two versions of SPECweb99 are very different. The Java version incurs 213% more cache misses than the Perl version for a 1 MB second level cache and 179% more misses for a 16 MB cache. Much of this increase is due to true sharing, which is 222% to 415% higher in the Java workload than in Perl. We find that for 1 MB caches, the exclusive state is effective, since it is able to remove 24% and 61% of the write upgrades for Perl and Java respectively. However, for 8 MB caches, the exclusive state only removes 1% to 5% of the upgrades. Also, it is important to have an efficient mechanism for cache to cache transfers, since 71% to 87% of the second level cache misses hit a modified line in another processor's cache when using 8 MB second level caches. Our early results show that processor consistency can improve the performance of Perl and Java SPECweb99 by 14% and 22% over sequential consistency. Surprisingly, we find that removing all of the serializations in processor consistency degrades the performance of the Java workload by 11%.

