Results 1 - 10
of
17
Garbage Collection without Paging
, 2005
"... Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes. We present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions. Using summary information ("bookmarks") recorded from evicted pages, the collector can perform in-memory full-heap collections. In the absence of memory pressure, the bookmarking collector matches the throughput of the best collector we tested while running in smaller heaps. In the face of memory pressure, it improves throughput by up to a factor of five and reduces pause times by up to a factor of 45 over the next best collector. Compared to a collector that consistently provides high throughput (generational mark-sweep), the bookmarking collector reduces pause times by up to 218x and improves throughput by up to 41x. Bookmarking collection thus provides greater utilization of available physical memory than other collectors while matching or exceeding their throughput.
A Locality-Improving Dynamic Memory Allocator
- MSP 2005
, 2005
"... Because most application data is dynamically allocated, the memory manager plays a crucial role in application performance by determining the spatial locality of heap objects. Previous generalpurpose allocators have focused on reducing fragmentation, while most locality-improving allocators have eit ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Because most application data is dynamically allocated, the memory manager plays a crucial role in application performance by determining the spatial locality of heap objects. Previous generalpurpose allocators have focused on reducing fragmentation, while most locality-improving allocators have either focused on improving the locality of the allocator (not the application) or required information supplied by the programmer or obtained by profiling. We present a high-performance memory allocator that builds on previous allocator designs to achieve low fragmentation while transparently improving application locality. Our allocator, called Vam, improves page-level locality by managing the heap in page-sized chunks and aggressively giving up free pages to the virtual memory manager. By eliminating object headers, using fine-grained size classes, and by allocating objects using a reap-based algorithm, Vam improves cache-level locality. Over a range of large footprint benchmarks, Vam improves application performance by an average of 4%--8% versus the Lea (Linux) and FreeBSD allocators. When memory is scarce, Vam improves application performance by up to 2X compared to the FreeBSD allocator, and by over 10X compared to the Lea allocator. We show that synergy between Vam's layout algorithms and the Linux swap clustering algorithm increases its swap prefetchability, further improving its performance when paging.
Program-level adaptive memory management
- In Proceedings of the International Symposium on Memory Management
, 2006
"... Most application’s performance is impacted by the amount of available memory. In a traditional application, which has a fixed working set size, increasing memory has a beneficial effect up until the application’s working set is met. In the presence of garbage collection this relationship becomes mor ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Most application’s performance is impacted by the amount of available memory. In a traditional application, which has a fixed working set size, increasing memory has a beneficial effect up until the application’s working set is met. In the presence of garbage collection this relationship becomes more complex. While increasing the size of the program’s heap reduces the frequency of collections, collecting a heap with memory paged to the backing store is very expensive. We first demonstrate the presence of an optimal heap size for a number of applications running on a machine with a specific configuration. We then introduce a scheme which adaptively finds this good heap size. In this scheme, we track the memory usage and number of page faults at a program’s phase boundaries. Using this information, the system selects the soft heap size. By adapting itself dynamically, our scheme is independent of the underlying main memory size, code optimizations, and garbage collection algorithm. We present several experiments on real applications to show the effectiveness of our approach. Our results show that program-level heap control provides up to a factor of 7.8 overall speedup versus using the best possible fixed heap size controlled by the virtual machine on identical garbage collectors.
Stopless: A real-time garbage collector for modern platforms
- in International Symposium on Memory Management (ISMM
, 2007
"... We present STOPLESS: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that supports real-time on modern platforms is notoriously hard, especially if real-time implies lock-freedom. Known ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present STOPLESS: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that supports real-time on modern platforms is notoriously hard, especially if real-time implies lock-freedom. Known real-time collectors either restrict the real-time guarantees to uniprocessors only, rely on special hardware, or just give up supporting atomic operations (which are crucial for lock-free software). STOPLESS is the first collector that provides real-time responsiveness while preserving lock-freedom, supporting atomic operations, controlling fragmentation by compaction, and supporting modern parallel platforms. STOPLESS is adequate for modern languages such as C # or Java. It was implemented on top of the Bartok compiler and runtime for C # and measurements demonstrate high responsiveness (a factor of a 100 better than previously published systems), virtually no pause times, good mutator utilization, and acceptable overheads. 1.
Flexible Task Graphs: A Unified Restricted Thread Programming Model for Java
, 2008
"... The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in realtime systems, have given rise to a variety of language extensions that place restrictions on how threads allocate, share, and communicate memory, leading to order-of-mag ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in realtime systems, have given rise to a variety of language extensions that place restrictions on how threads allocate, share, and communicate memory, leading to order-of-magnitude reductions in latency and jitter. However, each model makes different trade-offs with respect to expressiveness, efficiency, enforcement, and latency, and no one model is best for all applications. In this paper we present Flexible Task Graphs (Flexotasks), a single system that allows different isolation policies and mechanisms to be combined in an orthogonal manner, subsuming four previously proposed models as well as making it possible to use new combinations best suited to the needs of particular applications. We evaluate our implementation on top of the IBM Web-Sphere Real Time Java virtual machine using both a microbenchmark and a 30 KLOC avionics collision detector. We show that Flexotasks are capable of executing periodic threads at 10 KHz with a standard deviation of 1.2µs and that it achieves significantly better performance than RTSJ’s scoped memory constructs while remaining impervious to interference from global garbage collection.
Memory Management for Real-time Java: State of the Art
"... The Real-time Specification for Java extends the Java platform ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The Real-time Specification for Java extends the Java platform
Non-blocking Real-Time Garbage Collection
"... A real-time garbage collector has to fulfill two basic properties: ensure that programs with bounded allocation rates do not run out of memory and provide short blocking times. Even for incremental garbage collectors, two major sources of blocking exist, namely root scanning and heap compaction. Fin ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A real-time garbage collector has to fulfill two basic properties: ensure that programs with bounded allocation rates do not run out of memory and provide short blocking times. Even for incremental garbage collectors, two major sources of blocking exist, namely root scanning and heap compaction. Finding root nodes of an object graph is an integral part of tracing garbage collectors and cannot be circumvented. Heap compaction is necessary to avoid probably unbounded heap fragmentation, which in turn would lead to unacceptably high memory consumption. In this paper, we propose solutions to both issues. Thread stacks are local to a thread, and root scanning therefore only needs to be atomic with respect to the thread whose stack is scanned. This fact can be utilized by either blocking only the thread whose stack is scanned, or by delegating the responsibility for root scanning to the application threads. The latter solution eliminates blocking due to root scanning completely. The impact of this solution on the execution time of a garbage collector is shown for two different variants of such a root scanning algorithm. During heap compaction, objects are copied. Copying is usually performed atomically to avoid interference with application threads, which could render the state of an object inconsistent. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time systems. In this paper an interruptible copy unit is presented that implements non-blocking object copy. The unit can be interrupted after a single word move. We evaluate a real-time garbage collector that uses the proposed techniques on a Java processor. With this garbage collector, it is possible to run high priority hard real-time tasks at 10 kHz parallel to the garbage collection task on a 100 MHz system. Categories and Subject Descriptors: C.3 [Special-Purpose and Application-Based Systems]: Real-time and embedded systems; D.3.4 [Programming Languages]: Processors—Memory management (garbage collection)
High-level Programming of Embedded Hard Real-Time Devices
"... While managed languages such as C # and Java have become quite popular in enterprise computing, they are still considered unsuitable for hard real-time systems. In particular, the presence of garbage collection has been a sore point for their acceptance for low-level system programming tasks. Realti ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
While managed languages such as C # and Java have become quite popular in enterprise computing, they are still considered unsuitable for hard real-time systems. In particular, the presence of garbage collection has been a sore point for their acceptance for low-level system programming tasks. Realtime extensions to these languages have the dubious distinction of, at the same time, eschewing the benefits of highlevel programming and failing to offer competitive performance. The goal of our research is to explore the limitations of high-level managed languages for real-time systems programming. To this end we target a real-world embedded platform, the LEON3 architecture running the RTEMS real-time operating system, and demonstrate the feasibility of writing garbage collected code in critical parts of embedded systems. We show that Java with a concurrent, real-time garbage collector, can have throughput close to that of C programs and comes within 10 % in the worst observed case on realistic benchmark. We provide a detailed breakdown of the costs of Java features and their execution times and compare to real-time and throughput-optimized commercial Java virtual machines.
2006c. Waste not, want not: Adaptive garbage collection in a shared environment
"... Limiting the amount of memory available to a program can hamstring its performance, however in a garbage collected environment allowing too large of a heap size can also be detrimental. Because garbage collection will occasionally access the entire heap, having a significant amount of virtual memory ..."
Abstract
- Add to MetaCart
Limiting the amount of memory available to a program can hamstring its performance, however in a garbage collected environment allowing too large of a heap size can also be detrimental. Because garbage collection will occasionally access the entire heap, having a significant amount of virtual memory becomes expensive. Determining the appropriate size for a program’s heap is not only important, but difficult in light of various virtual machines, operating systems, and levels of multi-programming with which the program may be run. We present a model for program memory usage with which we can show how effective multi-programming is likely to be. In addition, we present an automated system for adding control at the program level that allows runtime adaptation of a program’s heap size. The process is fully automatic and requires no extra coding on the part of programmers. We discuss two adaptive schemes: the first acts independently, and while performing competitively, the system behaves politely in a multi-programmed environment. The second scheme explicitly cooperates when multiple instances are running. Both schemes are evaluated in terms of their response time, throughput, and fairness. 1
Quantifying and Improving the Performance of Garbage Collection
, 2006
"... Computer Science To Sarah for reminding me of everything I can do and to Shoshanna for inspiring me to do more. ACKNOWLEDGMENTS I am most grateful to my advisor, Emery Berger, for everything he has done throughout this thesis. I appreciate his guidance, suggestions, and inspiration. I feel especiall ..."
Abstract
- Add to MetaCart
Computer Science To Sarah for reminding me of everything I can do and to Shoshanna for inspiring me to do more. ACKNOWLEDGMENTS I am most grateful to my advisor, Emery Berger, for everything he has done throughout this thesis. I appreciate his guidance, suggestions, and inspiration. I feel especially fortu-nate for the patience he has shown with me throughout all the twists and turns my life took getting through this dissertation. I must also thank Eliot Moss and Kathryn McKinley for their leadership and support. I will be forever grateful that they took a chance on a student with a less-than-stellar aca-demic record and provided me with a fertile, inspiring research environment. They are both very knowledgeable and I benefited from our discussions in a myriad of ways. They have also served as members of my committee and I appreciate their helpful comments and sug-gestions. Thanks also to Scott Kaplan, another member of my committee, for his advice and feedback.

