Results 1 - 10
of
49
Dynamic storage allocation: A survey and critical review
, 1995
"... Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their de ..."
Abstract
-
Cited by 241 (6 self)
- Add to MetaCart
(Show Context)
Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their design and evaluation. We then chronologically survey most of the literature on allocators between 1961 and 1995. (Scores of papers are discussed, in varying detail, and over 150 references are given.) We argue that allocator designs have been unduly restricted by an emphasis on mechanism, rather than policy, while the latter is more important; higher-level strategic issues are still more important, but have not been given much attention. Most theoretical analyses and empirical allocator evaluations to date have relied on very strong assumptions of randomness and independence, but real program behavior exhibits important regularities that must be exploited if allocators are to perform well in practice.
The Slab Allocator: An Object-Caching Kernel Memory Allocator
- USENIX SUMMER TECHNICAL CONFERENCE
, 1994
"... This paper presents a comprehensive design overview of the SunOS 5.4 kernel memory allocator. This allocator is based on a set of object-caching primitives that reduce the cost of allocating complex objects by retaining their state between uses. These same primitives prove equally effective for mana ..."
Abstract
-
Cited by 115 (3 self)
- Add to MetaCart
(Show Context)
This paper presents a comprehensive design overview of the SunOS 5.4 kernel memory allocator. This allocator is based on a set of object-caching primitives that reduce the cost of allocating complex objects by retaining their state between uses. These same primitives prove equally effective for managing stateless memory (e.g. data pages and temporary buffers) because they are space-efficient and fast. The allocator’s object caches respond dynamically to global memory pressure, and employ an objectcoloring scheme that improves the system’s overall cache utilization and bus balance. The allocator also has several statistical and debugging features that can detect a wide range of problems throughout the system. 1.
Quantifying behavioral differences between C and C++ programs
- JOURNAL OF PROGRAMMING LANGUAGES
, 1994
"... Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming lang ..."
Abstract
-
Cited by 91 (15 self)
- Add to MetaCart
(Show Context)
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming language of choice. In this paper, we measure the empirical behavior of a group of significant C and C++ programs and attempt to identify and quantify behavioral differences between them. Our goal is to determine whether optimization technology that has been successful for C programs will also be successful in C++ programs. We furthermore identify behavioral characteristics of C++ programs that suggest optimizations that should be applied in those programs. Our results show that C++ programs exhibit behavior that is significantly different than C programs. These results should be of interest to compiler writers and architecture designers who are designing systems to execute object-oriented programs.
Automatic pool allocation: improving performance by controlling data structure layout in the heap
- In Proceedings of PLDI
, 2005
"... This paper describes Automatic Pool Allocation, a transformation framework that segregates distinct instances of heap-based data structures into seperate memory pools and allows heuristics to be used to partially control the internal layout of those data structures. The primary goal of this work is ..."
Abstract
-
Cited by 82 (9 self)
- Add to MetaCart
(Show Context)
This paper describes Automatic Pool Allocation, a transformation framework that segregates distinct instances of heap-based data structures into seperate memory pools and allows heuristics to be used to partially control the internal layout of those data structures. The primary goal of this work is performance improvement, not automatic memory management, and the paper makes several new contributions. The key contribution is a new compiler algorithm for partitioning heap objects in imperative programs based on a context-sensitive pointer analysis, including a novel strategy for correct handling of indirect (and potentially unsafe) function calls. The transformation does not require type safe programs and works for the full generality of C and C++. Second, the paper describes several optimizations that exploit data structure partitioning to fur-ther improve program performance. Third, the paper evaluates how memory hierarchy behavior and overall program performance are impacted by the new transformations. Using a number of bench-marks and a few applications, we find that compilation times are extremely low, and overall running times for heap intensive pro-grams speed up by 10-25 % in many cases, about 2x in two cases, and more than 10x in two small benchmarks. Overall, we believe this work provides a new framework for optimizing pointer inten-sive programs by segregating and controlling the layout of heap-based data structures.
The Measured Cost of Conservative Garbage Collection
- Software Practice and Experience
, 1993
"... this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little ef ..."
Abstract
-
Cited by 80 (5 self)
- Add to MetaCart
this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little effect on program performance. I also find that the true cost of conservative garbage collection is not the CPU overhead, but the memory system overhead of the algorithm. I conclude that conservative garbage collection is a promising alternative to explicit storage management and that the performance of conservative collection is likely to improve in the future. C programmers should now seriously consider using conservative garbage collection instead of explicitly calling free in programs they write
Using Lifetime Predictors to Improve Memory Allocation Performance
, 1993
"... Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are al ..."
Abstract
-
Cited by 77 (6 self)
- Add to MetaCart
Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (? 90% in all cases). Furthermore, we describe an algorithm for lifetime prediction that accurately predicts the lifetimes of 42--99% of all objects allocated. We describe and simulate a storage allocator that takes advantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.
Improving the Cache Locality of Memory Allocation
, 1993
"... The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality ..."
Abstract
-
Cited by 77 (8 self)
- Add to MetaCart
(Show Context)
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simulation of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benef...
Composing High-Performance Memory Allocators
- IN PROCEEDINGS OF THE 2001 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI
, 2001
"... Current general-purpose memory allocators do not provide sufficient speed or flexibility for modern high-performance applications. Highly-tuned general purpose allocators have per-operation costs around one hundred cycles, while the cost of an operation in a custom memory allocator can be just a han ..."
Abstract
-
Cited by 73 (22 self)
- Add to MetaCart
(Show Context)
Current general-purpose memory allocators do not provide sufficient speed or flexibility for modern high-performance applications. Highly-tuned general purpose allocators have per-operation costs around one hundred cycles, while the cost of an operation in a custom memory allocator can be just a handful of cycles. To achieve high performance, programmers often write custom memory allocators from scratch – a difficult and error-prone process.In this paper, we present a flexible and efficient infrastructure for building memory allocators that is based on C++ templates and inheritance. This novel approach allows programmers to build custom and general-purpose allocators as "heap layers" that can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies allocator construction and results in allocators that either match or improve the performance of heavily-tuned allocators written in C, including the Kingsley allocator and the GNU obstack library. We further show this infrastructure can be used to rapidly build a general-purpose allocator that has performance comparable to the Lea allocator, one of the best uniprocessor allocators available. We thus demonstrate a clean, easy-to-use allocator interface that seamlessly combines the power and efficiency of any number of general and custom allocators within a single application.
Vmalloc: A General and Efficient Memory Allocator
, 1996
"... Introduction Dynamic memory allocation is an integral part of programming. Programs in C and C++ (via constructors and destructors) routinely allocate memory using the familiar ANSI-C standard interface malloc established around 1979 by Doug McIlroy. Malloc manipulates heap memory using the functi ..."
Abstract
-
Cited by 52 (7 self)
- Add to MetaCart
Introduction Dynamic memory allocation is an integral part of programming. Programs in C and C++ (via constructors and destructors) routinely allocate memory using the familiar ANSI-C standard interface malloc established around 1979 by Doug McIlroy. Malloc manipulates heap memory using the functions malloc(s) to allocate a block of size s, free(b) to free a previously allocated block b, and realloc(b,s) to resize a block b to size s. No optimal solution to dynamic memory allocation exists [1, 2, 3] so, over the years, many malloc implementations were proposed with different tradeoffs in time and space efficiency. A study by David Korn and Phong Vo in 1985 presented and compared 11 malloc versions. Only a few of these survived the test of time. The first widely used malloc was written by McIlroy and became part of many Bell Labs Research and System V versions of the UNIX system. This malloc is based on a first-fit strategy and can be significantly slow in large memories. C. King
Cache-conscious frequent pattern mining on a modern processor
- In Proceedings of the International Conference on Very Large Data Bases (VLDB
, 2005
"... In this paper, we examine the performance of frequent pattern mining algorithms on a modern processor. A detailed performance study reveals that even the best frequent pattern mining implementations, with highly efficient memory managers, still grossly under-utilize a modern processor. The primary p ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
In this paper, we examine the performance of frequent pattern mining algorithms on a modern processor. A detailed performance study reveals that even the best frequent pattern mining implementations, with highly efficient memory managers, still grossly under-utilize a modern processor. The primary performance bottlenecks are poor data locality and low instruction level parallelism (ILP). We propose a cache-conscious prefix tree to address this problem. The resulting tree improves spatial locality and also enhances the benefits from hardware cache line prefetching. Furthermore, the design of this data structure allows the use of a novel tiling strategy to improve temporal locality. The result is an overall speedup of up to 3.2 when compared with state-of-the-art implementations. We then show how these algorithms can be improved further by realizing a non-naive thread-based decomposition that targets simultaneously multi-threaded processors. A key aspect of this decomposition is to ensure cache re-use between threads that are co-scheduled at a fine granularity. This optimization affords an additional speedup of 50%, resulting in an overall speedup of up to 4.8. To