Results 11 -
18 of
18
The Mapping Collector: Virtual Memory Support for Generational, Parallel, and Concurrent Compaction
, 2008
"... Parallel and concurrent garbage collectors are increasingly employed by managed runtime environments (MREs) to maintain scalability, as multi-core architectures and multi-threaded applications become pervasive. Moreover, state-of-the-art MREs commonly implement compaction to eliminate heap fragmenta ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Parallel and concurrent garbage collectors are increasingly employed by managed runtime environments (MREs) to maintain scalability, as multi-core architectures and multi-threaded applications become pervasive. Moreover, state-of-the-art MREs commonly implement compaction to eliminate heap fragmentation and enable fast linear object allocation. Our empirical analysis of object demographics reveals that unreachable objects in the heap tend to form clusters large enough to be effectively managed at the granularity of virtual memory pages. Even though processes can manipulate the mapping of the virtual address space through the standard operating system (OS) interface on most platforms, extant parallel/concurrent compactors do not do so to exploit this clustering behavior and instead achieve compaction by performing, relatively expensive, object moving and pointer adjustment. We introduce the Mapping Collector (MC), which leverages virtual memory operations to reclaim and consolidate free space without moving objects and updating pointers. MC is a nearly-singlephase compactor that is simpler and more efficient than previously reported compactors that comprise two to four phases. Through effective MRE-OS coordination, MC maintains the simplicity of a non-moving collector while providing efficient parallel and concurrent compaction. We implement both stop-the-world and concurrent MC in a generational garbage collection framework within the open-source HotSpot Java Virtual Machine. Our experimental evaluation using a multiprocessor indicates that MC significantly increases throughput and scalability as well as reduces pause times, relative to stateof-the-art, parallel and concurrent compactors.
Redline: First class support for interactivity in commodity operating systems
- In Proc. of the OSDI
, 2008
"... While modern workloads are increasingly interactive and resource-intensive (e.g., graphical user interfaces, browsers, and multimedia players), current operating systems have not kept up. These operating systems, which evolved from core designs that date to the 1970s and 1980s, provide good support ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
While modern workloads are increasingly interactive and resource-intensive (e.g., graphical user interfaces, browsers, and multimedia players), current operating systems have not kept up. These operating systems, which evolved from core designs that date to the 1970s and 1980s, provide good support for batch and command-line applications, but their ad hoc attempts to handle interactive workloads are poor. Their best-effort, priority-based schedulers provide no bounds on delays, and their resource managers (e.g., memory managers and disk I/O schedulers) are mostly oblivious to response time requirements. Pressure on any one of these resources can significantly degrade application responsiveness. We present Redline, a system that brings first-class support for interactive applications to commodity operating systems. Redline works with unaltered applications and standard APIs. It uses lightweight specifications to orchestrate memory and disk I/O management so that they serve the needs of interactive applications. Unlike realtime systems that treat specifications as strict requirements and thus pessimistically limit system utilization, Redline dynamically adapts to recent load, maximizing responsiveness and system utilization. We show that Redline delivers responsiveness to interactive applications even in the face of extreme workloads including fork bombs, memory bombs and bursty, large disk I/O requests, reducing application pauses by up to two orders of magnitude. 1
Adaptive, Application-Specific Garbage Collection
, 2003
"... In this paper, we describe a novel execution environment that can dynamically switch between garbage collection systems. As such, it enables selection of the most appropriate allocator and collector for a given application and underlying resource availability. Our system is novel in that it is able ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we describe a novel execution environment that can dynamically switch between garbage collection systems. As such, it enables selection of the most appropriate allocator and collector for a given application and underlying resource availability. Our system is novel in that it is able to switch between a wide range of diverse collection systems. It uses program annotations to guide selection of the collection system. In addition, it can automatically identify when to switch collectors when program execution behavior warrants it, i.e., it is adaptive. Our system introduces little overhead and accurately identifies the best collector for a wide range of benchmarks and heap sizes.
A New Approach to Parallelising Tracing Algorithms
"... Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processororiented in associating one ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processororiented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors. This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-readersingle-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies. While it is clear that our solution can be more effective on NUMA systems, and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.
Dynamic Prediction of Collection Yield for Managed Runtimes
"... The growth in complexity of modern systems makes it increasingly difficult to extract high-performance. The software stacks for such systems typically consist of multiple layers and include managed runtime environments (MREs). In this paper, we investigate techniques to improve cooperation between t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The growth in complexity of modern systems makes it increasingly difficult to extract high-performance. The software stacks for such systems typically consist of multiple layers and include managed runtime environments (MREs). In this paper, we investigate techniques to improve cooperation between these layers and the hardware to increase the efficacy of automatic memory management in MREs. General-purpose MREs commonly implement parallel and/or concurrent garbage collection and employ compaction to eliminate heap fragmentation. Moreover, most systems trigger collection based on the amount of heap a program uses. Our analysis shows that in many cases this strategy leads to ineffective collections that are unable to reclaim sufficient space to justify the incurred cost. To avoid such collections, we exploit the observation that dead objects tend to cluster together and form large, never-referenced, regions in the address space that correlate well with virtual pages that have not recently been referenced by the application. We leverage this correlation to design a new, simple and light-weight, yield predictor that estimates the amount of reclaimable space in the heap using hardware page reference bits. Our predictor allows MREs to avoid low-yield collections and thereby improve resource management. We integrate this predictor into three state-of-the-art parallel compactors, implemented in the HotSpot JVM, that represent distinct canonical heap layouts. Our empirical evaluation, based on standard Java benchmarks and opensource applications, indicates that inexpensive and accurate yield prediction can improve performance significantly.
2006c. Waste not, want not: Adaptive garbage collection in a shared environment
"... Limiting the amount of memory available to a program can hamstring its performance, however in a garbage collected environment allowing too large of a heap size can also be detrimental. Because garbage collection will occasionally access the entire heap, having a significant amount of virtual memory ..."
Abstract
- Add to MetaCart
Limiting the amount of memory available to a program can hamstring its performance, however in a garbage collected environment allowing too large of a heap size can also be detrimental. Because garbage collection will occasionally access the entire heap, having a significant amount of virtual memory becomes expensive. Determining the appropriate size for a program’s heap is not only important, but difficult in light of various virtual machines, operating systems, and levels of multi-programming with which the program may be run. We present a model for program memory usage with which we can show how effective multi-programming is likely to be. In addition, we present an automated system for adding control at the program level that allows runtime adaptation of a program’s heap size. The process is fully automatic and requires no extra coding on the part of programmers. We discuss two adaptive schemes: the first acts independently, and while performing competitively, the system behaves politely in a multi-programmed environment. The second scheme explicitly cooperates when multiple instances are running. Both schemes are evaluated in terms of their response time, throughput, and fairness. 1
Bounded Frame, Cycle and Large Object Handling in Generational Older-First Garbage Collection
, 2007
"... Over the years, research has been done on several techniques related to garbage collection. Many key insights for copying-based generational garbage collection tech-niques have been revealed. Yet, there is still room for improvement. In this thesis, we introduce various new techniques and algorithms ..."
Abstract
- Add to MetaCart
Over the years, research has been done on several techniques related to garbage collection. Many key insights for copying-based generational garbage collection tech-niques have been revealed. Yet, there is still room for improvement. In this thesis, we introduce various new techniques and algorithms to improve garbage collection. In particular, we introduce the bounded frame marking technique for tracking pointers. This technique allows for efficient computation of the root set. It reuses concepts from two existing techniques, card marking and remembered sets, and uses a bidirectional object layout to improve them by regulating space overhead and reducing the pointer scanning workload. We also present an algorithm to recursively mark reachable objects without using a stack (eliminating the usual space overhead). We adapt this algorithm to implement a depth-first copying collector and increase heap locality. We improve the older-first garbage collection algorithm and its generational variant by adding a mark phase that guarantees the collection of all garbage, including cyclic structures spanning many windows. Finally, we introduce a technique to deal with large objects. In order to test our ideas, we have designed and implemented a portable and extensible garbage collection framework within the SableVM open source Java virtual machine. In it, we have implemented semi-space, older-first, and generational copying garbage collection algorithms. Our experiments show that the bounded frame technique yields competitive performances on many benchmarks. They also show that, for most benchmarks, our depth-first traversal algorithm improves locality and thus increases performance. Our overall performance measurements show that, using our techniques, a garbage collector can deliver competitive performance and surpass existing collectors on various benchmarks.
A Page Fault Equation for Dynamic Heap Sizing (A shorter, 6-page version will appear in WOSP/SIPEW 2010)
"... For garbage-collected applications, dynamically-allocated objects are contained in a heap. Programmer productivity improves significantly if there is a garbage collector to automatically de-allocate objects that are no longer needed by the applications. However, there is a run-time performance overh ..."
Abstract
- Add to MetaCart
For garbage-collected applications, dynamically-allocated objects are contained in a heap. Programmer productivity improves significantly if there is a garbage collector to automatically de-allocate objects that are no longer needed by the applications. However, there is a run-time performance overhead in garbage collection, and this cost is sensitive to heap size H: a smaller H will trigger more collection, but a large H can cause page faults, as when H exceeds the size M of main memory allocated to the application. This paper presents a Heap Sizing Rule for how H should vary with M. The Rule can help an application trade less page faults for more garbage collection, thus reducing execution time. It is based on a heap-aware Page Fault Equation that models how the number of page faults depends on H and M. Experiments show that this rule outperforms the default policy used by JikesRVM’s heap size manager. Specifically, the number of faults and the execution time are reduced for both static and dynamically changing M. 1.

