Results 1 -
5 of
5
Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance
, 2008
"... Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fa ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. This paper describes a collector family, called mark-region, and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and reclaim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25 % on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design.
The Mapping Collector: Virtual Memory Support for Generational, Parallel, and Concurrent Compaction
, 2008
"... Parallel and concurrent garbage collectors are increasingly employed by managed runtime environments (MREs) to maintain scalability, as multi-core architectures and multi-threaded applications become pervasive. Moreover, state-of-the-art MREs commonly implement compaction to eliminate heap fragmenta ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Parallel and concurrent garbage collectors are increasingly employed by managed runtime environments (MREs) to maintain scalability, as multi-core architectures and multi-threaded applications become pervasive. Moreover, state-of-the-art MREs commonly implement compaction to eliminate heap fragmentation and enable fast linear object allocation. Our empirical analysis of object demographics reveals that unreachable objects in the heap tend to form clusters large enough to be effectively managed at the granularity of virtual memory pages. Even though processes can manipulate the mapping of the virtual address space through the standard operating system (OS) interface on most platforms, extant parallel/concurrent compactors do not do so to exploit this clustering behavior and instead achieve compaction by performing, relatively expensive, object moving and pointer adjustment. We introduce the Mapping Collector (MC), which leverages virtual memory operations to reclaim and consolidate free space without moving objects and updating pointers. MC is a nearly-singlephase compactor that is simpler and more efficient than previously reported compactors that comprise two to four phases. Through effective MRE-OS coordination, MC maintains the simplicity of a non-moving collector while providing efficient parallel and concurrent compaction. We implement both stop-the-world and concurrent MC in a generational garbage collection framework within the open-source HotSpot Java Virtual Machine. Our experimental evaluation using a multiprocessor indicates that MC significantly increases throughput and scalability as well as reduces pause times, relative to stateof-the-art, parallel and concurrent compactors.
A New Approach to Parallelising Tracing Algorithms
"... Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processororiented in associating one ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processororiented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors. This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-readersingle-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies. While it is clear that our solution can be more effective on NUMA systems, and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.
A Lock-Free, Concurrent, and Incremental Stack Scanning for Garbage Collectors
"... Two major efficiency parameters for garbage collectors are the throughput overheads and the pause times that they introduce. Highly responsive systems need to use collectors with as short as possible pause times. Pause lengths have decreased significantly during the years, especially through the use ..."
Abstract
- Add to MetaCart
Two major efficiency parameters for garbage collectors are the throughput overheads and the pause times that they introduce. Highly responsive systems need to use collectors with as short as possible pause times. Pause lengths have decreased significantly during the years, especially through the use of concurrent garbage collectors. For modern concurrent collectors, the longest pause is typically created by the need to atomically scan the runtime stack. All practical concurrent collectors that we are aware of must obtain a snapshot of the pointers on each thread’s runtime stack, in order to reclaim objects correctly. To further reduce the length of the collector pauses, incremental stack scans were proposed. However, previous such methods employ locks to stop the mutator from accessing a stack frame while it is being scanned. Thus, these methods introduce a potential long and unpredictable pauses for a mutator thread. In this work we propose the first concurrent, incremental, and lock-free stack scanning for garbage collectors, allowing high responsiveness and support for programs that employ fine-synchronization to avoid locks. Our solution can be employed by all concurrent collectors that we are aware of, it is lock-free, it imposes a negligible overhead on the program execution, and it supports the special in-stack references existing in languages like C#.
> Sun Microsystems LaboratoriesAdaptive Optimization of the Sun Java TM Real-Time System Garbage Collector
, 2009
"... Garbage collection (GC) is one of the largest sources of unpredictability in Java TM applications, and a real-time virtual machine must use garbage collection algorithms that minimize delays to real-time threads and at the same time maximize the overall application’s throughput. In order to achieve ..."
Abstract
- Add to MetaCart
Garbage collection (GC) is one of the largest sources of unpredictability in Java TM applications, and a real-time virtual machine must use garbage collection algorithms that minimize delays to real-time threads and at the same time maximize the overall application’s throughput. In order to achieve the optimal tradeoff between these conflicting objectives, the GC cycle (which needs to take place periodically in order to free the memory no longer used by the application) needs to be triggered at the optimal time: if it is triggered too soon then the application’s throughput will decrease unnecessarily, while if it is triggered too late then the application can run out of free memory and block real-time threads unnecessarily. Starting with Sun Java Real-Time System 2.0 (Java RTS), a new real-time garbage collector (RTGC) is available. One of the key RTGC parameters is the StartupMemoryThreshold, which determines how low the free memory in the system can fall before a garbage collection is triggered. This paper presents a framework for dynamically adapting the StartupMemoryThreshold for achieving the optimal balance between the application’s throughput and pause time, which was integrated into the beta release of Java RTS 2.2. An experimental evaluation of this framework using the SPECjbb2005 benchmark confirmed its effectiveness. This framework can be used in conjunction with any concurrent or a time-based incremental garbage collector.

