Results 1 -
6 of
6
The Measured Cost of Conservative Garbage Collection
- Software Practice and Experience
, 1993
"... this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little ef ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little effect on program performance. I also find that the true cost of conservative garbage collection is not the CPU overhead, but the memory system overhead of the algorithm. I conclude that conservative garbage collection is a promising alternative to explicit storage management and that the performance of conservative collection is likely to improve in the future. C programmers should now seriously consider using conservative garbage collection instead of explicitly calling free in programs they write
Improving the Cache Locality of Memory Allocation
, 1993
"... The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simulation of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benef...
Parallel data mining for association rules on shared-memory multiprocessors
- In Proc. Supercomputing’96
, 1996
"... Abstract. In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a signific ..."
Abstract
-
Cited by 62 (19 self)
- Add to MetaCart
Abstract. In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up for the parallel algorithm. A lot of data-mining tasks (e.g. association rules, sequential patterns) use complex pointer-based data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains.
CustoMalloc: Efficient Synthesized Memory Allocators
- SOFTWARE—PRACTICE AND EXPERIENCE
, 1993
"... ... In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster and more space efficient than the Berkeley UNIX allocator. Constructing a custom allocator requi ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
... In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster and more space efficient than the Berkeley UNIX allocator. Constructing a custom allocator requires little programmer effort, usually taking only a few minutes. Experience has shown that the synthesized allocators are not overly sensitive to properties of input sets and the resulting allocators are superior even to domain-specific allocators designed by programmers. Measurements show that synthesized allocators are from two to ten times faster than widelyused allocators
Evaluating Models of Memory Allocation
- ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
, 1992
"... Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. We evaluate and compare models of the memory allocation behavior in actual programs an ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. We evaluate and compare models of the memory allocation behavior in actual programs and investigate how these models can be used to explore the performance of memory management algorithms. These models, if accurate enough, provide an attractive alternative to algorithm evaluation based on trace-driven simulation using actual traces. We explore a range of models of increasing complexity including models that have been used by other researchers. Based on our analysis, we draw three important conclusions. First, a very simple model, which generates a uniform distribution around the mean of observed values, is often quite accurate. Second, two new models we propose show greater accuracy than those previously described in the literature. Finally, none of the models investigated ap...
Custom Memory Placement for Parallel Data Mining
, 1997
"... A lot of data mining tasks, such as Associations, Sequences, and Classification, use complex pointer-based data structures that typically suffer from sub-optimal locality. In the multiprocessor case shared access to these data structures may also result in false-sharing. Most of the optimization tec ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A lot of data mining tasks, such as Associations, Sequences, and Classification, use complex pointer-based data structures that typically suffer from sub-optimal locality. In the multiprocessor case shared access to these data structures may also result in false-sharing. Most of the optimization techniques for enhancing locality and reducing false sharing have been proposed in the context of numeric applications involving array-based data structures, and are not applicable for dynamic data structures due to dynamic memory allocation from the heap with arbitrary addresses. Within the context of data mining it is commonly observed that the building phase of these large recursive data structures, such as hash trees and decision trees, is random and independent from the access phase which is usually ordered and typically dominates the computation time. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate ...

