Results 1 - 10
of
269
The Jalapeño Dynamic Optimizing Compiler for Java
, 1999
"... The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and ..."
Abstract
-
Cited by 159 (28 self)
- Add to MetaCart
The JalapeÃño Dynamic Optimizing Compiler is a key component of the JalapeÃño Virtual Machine, a new Java Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the JalapeÃño Optimizing Compiler, and the implementation results that we have obtained thus far. To the best of our knowledge, this is the first dynamic optimizing compiler for Java that is being used in a JVM with a compile-only approach to program execution.
Implementing Jalapeno in Java
, 1999
"... Jalape~no is a virtual machine for Java TM servers written in Java. A running Java program involves four layers of functionality: the user code, the virtual-machine, the operating system, and the hardware. By drawing the Java / non-Java boundary below the virtual machine rather than above it, Jal ..."
Abstract
-
Cited by 81 (4 self)
- Add to MetaCart
Jalape~no is a virtual machine for Java TM servers written in Java. A running Java program involves four layers of functionality: the user code, the virtual-machine, the operating system, and the hardware. By drawing the Java / non-Java boundary below the virtual machine rather than above it, Jalape~no reduces the boundary-crossing overhead and opens up more opportunities for optimization. To get Jalape~no started, a boot image of a working Jalape ~no virtual machine is concocted and written to a file. Later, this file can be loaded into memory and executed. Because the boot image consists entirely of Java objects, it can be concocted by a Java program that runs in any JVM. This program uses reflection to convert the boot image into Jalape~no's object format. A special Magic class allows unsafe casts and direct access to the hardware. Methods of this class are recognized by Jalape~no's three compilers, which ignore their bytecodes and emit special-purpose machine code. User code w...
Oil and Water? High Performance Garbage Collection in Java with MMTk
- In ICSE 2004, 26th International Conference on Software Engineering
, 2004
"... Increasingly popular languages such as Java and C # require efficient garbage collection. This paper presents the design, implementation, and evaluation of MMTk, a Memory Management Toolkit for and in Java. MMTk is an efficient, composable, extensible, and portable framework for building garbage col ..."
Abstract
-
Cited by 81 (18 self)
- Add to MetaCart
Increasingly popular languages such as Java and C # require efficient garbage collection. This paper presents the design, implementation, and evaluation of MMTk, a Memory Management Toolkit for and in Java. MMTk is an efficient, composable, extensible, and portable framework for building garbage collectors. MMTk uses design patterns and compiler cooperation to combine modularity and efficiency. The resulting system is more robust, easier to maintain, and has fewer defects than monolithic collectors. Experimental comparisons with monolithic Java and C implementations reveal MMTk has significant performance advantages as well. Performance critical system software typically uses monolithic C at the expense of flexibility. Our results refute common wisdom that only this approach attains efficiency, and suggest that performance critical software can embrace modular design and high-level languages. 1
Scheduling Garbage Collection in Embedded Systems
, 1998
"... The complexity of systems for automatic control and other safety-critical applications grows rapidly. Computer software represents an increasing part of the complexity. As larger systems are developed, we need to find scalable techniques to manage the complexity in order to guarantee high product qu ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
The complexity of systems for automatic control and other safety-critical applications grows rapidly. Computer software represents an increasing part of the complexity. As larger systems are developed, we need to find scalable techniques to manage the complexity in order to guarantee high product quality. Memory management is a key quality factor for these systems. Automatic memory management, or garbage collection, is a technique that significantly reduces the complex problem of correct memory management. The risk of software errors decreases and development time is reduced. Garbage collection techniques suitable for interactive and soft real-time systems exist, but few approaches are suitable for systems with hard real-time requirements, such as control systems (embedded systems). One part of the problem is solved by incremental garbage collection algorithms, which have been presented before. We focus on the scheduling problem which forms the second part of the problem, i.e. how the work of a garbage collector should be scheduled in order
Concurrent Programming Without Locks
, 2004
"... Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furt ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furthermore, scalable lock-based systems are not readily composable when building compound operations. In looking for solutions to these problems, interest has developed in nonblocking systems which have promised scalability and robustness by eschewing mutual exclusion while still ensuring safety. However, existing techniques for building non-blocking systems are rarely suitable for practical use, imposing substantial storage overheads, serialising non-conflicting operations, or requiring instructions not readily available on today’s CPUs. In this paper we present three APIs which make it easier to develop non-blocking implementations of arbitrary data structures. The first API is a multi-word compare-and-swap operation (MCAS) which atomically updates a set of memory locations. This can be used to advance a data structure from one consistent state to another. The second API is a word-based software transactional memory (WSTM) which can allow sequential code to be re-used more directly than with MCAS and which provides better scalability when locations are being read rather than being
Generational Stack Collection and Profile-Driven Pretenuring
, 1998
"... This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stackbased implementations of functional languages while the second is useful for any generational collector. We have implemented ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stackbased implementations of functional languages while the second is useful for any generational collector. We have implemented both techniques in a generational collector used by the TIL compiler (Tarditi, Morrisett, Cheng, Stone, Harper, and Lee 1996), and have observed decreases in garbage collection times of as much as 70% and 30%, respectively. Functional languages encourage the use of recursion which can lead to a long chain of activation records. When a collection occurs, these activation records must be scanned for roots. We show that scanning many activation records can take so long as to become the dominant cost of garbage collection. However, most deep stacks unwind very infrequently, so most of the root information obtained from the stack remains unchanged across successive garbage collections. Generatio...
Beltway: Getting Around Garbage Collection Gridlock
- PLDI'02
, 2002
"... We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or more increments on queues called belts, collects belts in ..."
Abstract
-
Cited by 59 (16 self)
- Add to MetaCart
We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or more increments on queues called belts, collects belts independently, and collects increments on a belt in first-in-first-out order. We show that Beltway configurations, selected by command line options, act and perform the same as semi-space, generational, and older-first collectors, and encompass all previous copying collectors of which we are aware.
The Jrpm System for Dynamically Parallelizing Java Programs
- In Proceedings of the 30th International Symposium on Computer Architecture
, 2003
"... We describe the Java runtime parallelizing machine (Jrpm), a complete system for parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor (CMP) with thread-level speculation (TLS) support. CMPs have low sharing and communication costs relative to traditional multt)roce ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
We describe the Java runtime parallelizing machine (Jrpm), a complete system for parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor (CMP) with thread-level speculation (TLS) support. CMPs have low sharing and communication costs relative to traditional multt)rocessors, and thread-level speculation (TLS) simplifies program parallelization by allowing us to parallelize optimistically without violating correct sequential program behavior. Using a Java virtual machine with dynamic compilation support coupled with a hardware profiler, speculative buffer requirements and inter-thread dependencies of prospective speculative thread loops (STLs) are analyzed in real-time to identi the best loops to parallelize. Once sufficient data has been collected to make a reasonable decision, selected loops are dynamically recompiled to run in parallel Experimental results demonstrate that Jrpm can exploit thread-level parallelism with minimal effort from the programmer. On four processors, we achieved speedups of 3 to 4 for floating point applications, to 3 on multimedia applications, and between 1.5 and .5 on integer applications. Performance was achieved by automatic selection of thread decompositions by the hardware profiler, intra-procedural optimizations on code compiled dynamically into speculative threads, and some minor programmer transformations for exposing parallelism that cannot be performed automatically.
Age-Based Garbage Collection
- In Proceedings of SIGPLAN 1999 Conference on Object-Oriented Programming, Languages, & Applications
, 1999
"... Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, called age-based, some of which postpone consider ..."
Abstract
-
Cited by 45 (13 self)
- Add to MetaCart
Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, called age-based, some of which postpone consideration of the youngest objects. Collecting less than the whole heap requires write barrier mechanisms to track pointers into the collected region. We describe here a new, efficient write barrier implementation that works for age-based and traditional generational collectors. To compare several collectors, their configurations, and program behavior, we use an accurate simulator that models all heap objects and the pointers among them, but does not model cache or other memory effects. For object-oriented languages, our results demonstrate that an older-first collector, which collects older objects before the youngest ones, copies on average much less data than generational collectors. Our resul...

