Results 1 - 10
of
12
Cache-Conscious Structure Layout
, 1999
"... Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs. This paper explores a complementary appro ..."
Abstract
-
Cited by 164 (8 self)
- Add to MetaCart
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs. This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement technique-lustering and colorinet improve cache performance by increasing a pointer structure’s spatial and temporal locality, and by reducing cache-conflicts. To reduce the cost of applying these techniques, this paper discusses two strategies-cache-conscious reorganization and cacheconscious allocation--and describes two semi-automatic toolsccmorph and ccmalloc-that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefit-n most cases, significantly outperforming state-of-the-art prefetching.
Using generational garbage collection to implement cache-conscious data placement
- In Proceedings of the International Symposium on Memory Management
, 1998
"... The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. These techniques, unfortunately, are only occasio ..."
Abstract
-
Cited by 90 (11 self)
- Add to MetaCart
The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. These techniques, unfortunately, are only occasionally successful for pointer-manipulating programs. Recent research has demonstrated the value of a complementary approach, in which pointer-based data structures are reorganized to improve cache locality. This paper studies a technique for using a generational garbage collector to reorganize data
Improving the Cache Locality of Memory Allocation
, 1993
"... The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simulation of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benef...
Garbage Collection and DSM Consistency
, 1994
"... This paper presents the design of a copying garbage collector for persistent distributed shared objects in a loosely coupled network with weakly consistent distributed shared memory (DSM). The main goal of the design for this garbage collector is to minimize the communication overhead due to collect ..."
Abstract
-
Cited by 40 (17 self)
- Add to MetaCart
This paper presents the design of a copying garbage collector for persistent distributed shared objects in a loosely coupled network with weakly consistent distributed shared memory (DSM). The main goal of the design for this garbage collector is to minimize the communication overhead due to collection between nodes of the system, and to avoid any interference with the DSM memory consistency protocol. Our design is based on the observation that, in a weakly consistent DSM system, the memory consistency requirements of the garbage collector are less strict than those of the applications. Thus, the garbage collector reclaims objects independently of other copies of the same objects without interfering with the DSM consistency protocol. Furthermore, our design does not require reliable communication support, and is capable of reclaiming distributed cycles of dead objects. 1 Introduction Garbage collection (GC) is a fundamental component for supporting persistent objects in distributed s...
Garbage Collection without Paging
, 2005
"... Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes. We present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions. Using summary information ("bookmarks") recorded from evicted pages, the collector can perform in-memory full-heap collections. In the absence of memory pressure, the bookmarking collector matches the throughput of the best collector we tested while running in smaller heaps. In the face of memory pressure, it improves throughput by up to a factor of five and reduces pause times by up to a factor of 45 over the next best collector. Compared to a collector that consistently provides high throughput (generational mark-sweep), the bookmarking collector reduces pause times by up to 218x and improves throughput by up to 41x. Bookmarking collection thus provides greater utilization of available physical memory than other collectors while matching or exceeding their throughput.
Profile-guided proactive garbage collection for locality optimization
- In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation
, 2006
"... Many applications written in garbage collected languages have large dynamic working sets and poor data locality. We present a new system for continuously improving program data locality at run time with low overhead. Our system proactively reorganizes the heap by leveraging the garbage collector and ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Many applications written in garbage collected languages have large dynamic working sets and poor data locality. We present a new system for continuously improving program data locality at run time with low overhead. Our system proactively reorganizes the heap by leveraging the garbage collector and uses profile information collected through a low-overhead mechanism to guide the reorganization at run time. The key contributions include making a case that garbage collection should be viewed as a proactive technique for improving data locality by triggering garbage collection for locality optimization independently of normal garbage collection for space, combining page and cache locality optimization in the same system, and demonstrating that sampling provides sufficiently detailed data access information to guide both page and cache locality optimization with low runtime overhead. We present experimental results obtained by modifying a commercial, state-of-the-art garbage collector to support our claims. Independently triggering garbage collection for locality optimization significantly improved optimizations benefits. Combining page and cache locality optimizations in the same system provided larger average execution time improvements (17%) than either alone (page 8%, cache 7%). Finally, using sampling limited profiling overhead to less than 3%, on average. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors – code generation, memory management (garbage collectors), optimization, run-time
Scalable Real-time Parallel Garbage Collection for Symmetric Multiprocessors
, 2001
"... model for garbage collection. ..."
Locality of Reference, Patterns in Program Behavior, Memory Management, and Memory Hierarchies
"... Locality of reference is crucial to the performance of modern computers, but is actually poorly understood. In this paper, we survey issues in locality and memory hierarchy design, attempting to bring together what is known, correct common misconceptions, and clarify what is not known. We present a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Locality of reference is crucial to the performance of modern computers, but is actually poorly understood. In this paper, we survey issues in locality and memory hierarchy design, attempting to bring together what is known, correct common misconceptions, and clarify what is not known. We present a unified approach to locality, based on the concept of timescale relativity, which simply says that some patterns in program behavior are relevant to issues of caching, and others are not, and that the difference depends crucially on the timescale relevant to a particular cache. Memory hierarchies use a kind of online, adaptive algorithm to control caching; such algorithms cannot be studied properly without some understanding of the regularities in the "data" (program behavior) they must process. We attempt a vertical unification, showing that locality of reference results from regularities in the structure of programs, and from regularities in how memory allocators map program objects ont...
Potential Interdependencies Between Caches, TLBs and Memory Management Schemes
- Center for Information Technology, Sankt Augustin
, 1995
"... Dynamic memory management usually stresses the randomness of data memory usage; the variables of a dynamic cache working set are to some degree distributed stochastically in the virtual or physical address space. This interferes with cache and TLB architectures, since, currently, most of them are hi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Dynamic memory management usually stresses the randomness of data memory usage; the variables of a dynamic cache working set are to some degree distributed stochastically in the virtual or physical address space. This interferes with cache and TLB architectures, since, currently, most of them are highly sensitive to access patterns. In the above mentioned stochastically distributed case, the true capacity is (a) far below the cache or TLB size and (b) largely differs from processor to processor. As a consequence, dynamic memory management schemes may substantially influence cache/TLB hit rates and thus overall program performance. After presenting basic cache and TLB architectures in short, an analytical model for evaluating their true capacities is developed and applied to various architectures. Some industrial processors are evaluated in the same way and potential implications for memory management techniques are discussed. Furthermore, a new architecture for caches and TLBs is presented which improves their true capacity and reduces their dependence on usage patterns.
Custom Object Layout for Garbage-Collected Languages
"... Modern architectures require data locality to achieve performance. However, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Modern architectures require data locality to achieve performance. However,

