Results 1 - 10
of
15
Cache-Conscious Structure Layout
, 1999
"... Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs. This paper explores a complementary appro ..."
Abstract
-
Cited by 164 (8 self)
- Add to MetaCart
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs. This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement technique-lustering and colorinet improve cache performance by increasing a pointer structure’s spatial and temporal locality, and by reducing cache-conflicts. To reduce the cost of applying these techniques, this paper discusses two strategies-cache-conscious reorganization and cacheconscious allocation--and describes two semi-automatic toolsccmorph and ccmalloc-that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefit-n most cases, significantly outperforming state-of-the-art prefetching.
Distributed Paging for General Networks
, 1996
"... Distributed paging [BFR92, ABF93b, AK95] deals with the dynamic allocation of copies of files in a distributed network as to minimize the total communication cost over a sequence of read and write requests. Most previous work deals with the file allocation problem [BS89, West91, CLRW93, ABF93a, ..."
Abstract
-
Cited by 55 (5 self)
- Add to MetaCart
Distributed paging [BFR92, ABF93b, AK95] deals with the dynamic allocation of copies of files in a distributed network as to minimize the total communication cost over a sequence of read and write requests. Most previous work deals with the file allocation problem [BS89, West91, CLRW93, ABF93a, WY93, Koga93, AK94, LRWY94] where infinite nodal memory capacity is assumed. In contrast the distributed paging problem makes the more realistic assumption that nodal memory capacity is limited. Former work on distributed paging deals with the problem only in the case of a uniform network topology. This paper gives the first distributed paging algorithm for general networks. The algorithm is competitive in storage and communication. The competitive ratios are poly-logarithmic in the total number of network nodes and the diameter of the network. Johns Hopkins University and Lab. for Computer Science, MIT. Supported by Air Force Contract TNDGAFOSR-86-0078, ARO contract DAAL03-86-K-0171, NSF contract 9114440-CCR, DARPA contract N00014J -92-1799, and a special grant from IBM. E-Mail: baruch@theory.lcs.mit.edu. y Department of Computer Science, School of Mathematics, Tel-Aviv University, Tel-Aviv 69978, Israel. Supported by a grant from the Israeli Academy of Sciences. E-mail: yairb@math.tau.ac.il, fiat@math.tau.ac.il 0 1
A Tractable Scheme Implementation
- Lisp and Symbolic Computation
"... . Scheme 48 is an implementation of the Scheme programming language constructed with tractability and reliability as its primary design goals. It has the structural properties of large, compiler-based Lisp implementations: it is written entirely in Scheme, is bootstrapped via its compiler, and provi ..."
Abstract
-
Cited by 54 (4 self)
- Add to MetaCart
. Scheme 48 is an implementation of the Scheme programming language constructed with tractability and reliability as its primary design goals. It has the structural properties of large, compiler-based Lisp implementations: it is written entirely in Scheme, is bootstrapped via its compiler, and provides numerous language extensions. It controls the complexity that ordinarily attends such large Lisp implementations through clear articulation of internal modularity and by the exclusion of features, optimizations, and generalizations that are of only marginal value. 1. Introduction Scheme 48 is an implementation of the Scheme programming language constructed with tractability and reliability as its primary design goals. By tractability we mean the ease with which the system can be understood and changed. Although Lisp dialects, including Scheme, are relatively simple languages, implementation tractability is often threatened by the demands of providing high performance and extended funct...
A Language-Independent Garbage Collector Toolkit
, 1991
"... We describe a memory management toolkit for language implementors. It offers efficient and flexible generation scavenging garbage collection. In addition to providing a core of languageindependent algorithms and data structures, the toolkit includes auxiliary components that ease implementation of g ..."
Abstract
-
Cited by 49 (14 self)
- Add to MetaCart
We describe a memory management toolkit for language implementors. It offers efficient and flexible generation scavenging garbage collection. In addition to providing a core of languageindependent algorithms and data structures, the toolkit includes auxiliary components that ease implementation of garbage collection for programming languages. We have detailed designs for Smalltalk and Modula-3 and are confident the toolkit can be used with a wide variety of languages. The toolkit approach is itself novel, and our design includes a number of additional innovations in flexibility, efficiency, accuracy, and cooperation between the compiler and the collector. This project is supported by National Science Foundation Grant CCR-8658074, and by Digital Equipment Corporation, GTE Laboratories, and Apple Computer. 1 Introduction As part of an ongoing effort to implement Persistent Smalltalk and Persistent Modula-3, we have designed a high performance garbage collector toolkit that can be us...
Vectorized Garbage Collection
- Topics in Advanced language Implementation
, 1990
"... Garbage collection can be done in vector mode on supercomputers like the Cray-2 and the Cyber 205. Both copying collection and mark-and-sweep can be expressed as breadth-first searches in which the "queue" can be processed in parallel. We have designed a copying garbage collector whose inner loop wo ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Garbage collection can be done in vector mode on supercomputers like the Cray-2 and the Cyber 205. Both copying collection and mark-and-sweep can be expressed as breadth-first searches in which the "queue" can be processed in parallel. We have designed a copying garbage collector whose inner loop works entirely in vector mode. We give performance measurements of the algorithm as implemented for Lisp CONS cells on the Cyber 205. Vector-mode garbage collection performs up to 9 times faster than scalar-mode collection --- a worthwhile improvement. - 1. Automatic garbage collection on vector supercomputers Languages like Lisp with dynamic storage allocation and automatic garbage collection are increasingly being used on vector supercomputers. Implementations of Lisp have been done for Cray supercomputers [1], and fully supported supercomputer Lisp environments will soon be available (e.g. Common Lisp provided by Cray Research and Franz, Inc.)[2]. This is a natural development. Languages ...
Age-Based Garbage Collection
- In Proceedings of SIGPLAN 1999 Conference on Object-Oriented Programming, Languages, & Applications
, 1999
"... Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, called age-based, some of which postpone consider ..."
Abstract
-
Cited by 45 (13 self)
- Add to MetaCart
Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, called age-based, some of which postpone consideration of the youngest objects. Collecting less than the whole heap requires write barrier mechanisms to track pointers into the collected region. We describe here a new, efficient write barrier implementation that works for age-based and traditional generational collectors. To compare several collectors, their configurations, and program behavior, we use an accurate simulator that models all heap objects and the pointers among them, but does not model cache or other memory effects. For object-oriented languages, our results demonstrate that an older-first collector, which collects older objects before the youngest ones, copies on average much less data than generational collectors. Our resul...
A Comparative Performance Evaluation of Write Barrier Implementations
, 1992
"... Generational garbage collectors are able to achieve very small pause times by concentrating on the youngest (most recently allocated) objects when collecting, since objects have been observed to die young in many systems. Generational collectors must keep track of all pointers from older to younger ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
Generational garbage collectors are able to achieve very small pause times by concentrating on the youngest (most recently allocated) objects when collecting, since objects have been observed to die young in many systems. Generational collectors must keep track of all pointers from older to younger generations, by "monitoring " all stores into the heap. This write barrier has been implemented in a number of ways, varying essentially in the granularity of the information observed and stored. Here we examine a range of write barrier implementations and evaluate their relative performance within a generation scavenging garbage collector for Smalltalk. 1 Introduction Generational collectors achieve short collection pause times partly because they separate heap-allocated objects into two or more generations and do not process all generations during each collection. Empirical studies have shown that in many programs most objects die young, so separating objects by age and focusing collecti...
Incremental collection of mature objects
- In Proceedings of the International Workshop on Memory Management
, 1992
"... Abstract. We present a garbage collection algorithm that extends generational scavenging to collect large older generations (mature objects) non-disruptively. The algorithm’s approach is to process bounded-size pieces of mature object space at each collection; the subtleties lie in guaranteeing that ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Abstract. We present a garbage collection algorithm that extends generational scavenging to collect large older generations (mature objects) non-disruptively. The algorithm’s approach is to process bounded-size pieces of mature object space at each collection; the subtleties lie in guaranteeing that it eventually collects any and all garbage. The algorithm does not assume any special hardware or operating system support, e.g., for forwarding pointers or protection traps. The algorithm copies objects, so it naturally supports compaction and reclustering.
Automatic Pool Allocation for Disjoint Data Structures
, 2002
"... This paper presents an analysis technique and a novel program transformation that can enable powerful optimizations for entire linked data structures. The fully automatic transformation converts ordinary programs to use pool (aka region) allocation for heap-based data structures. The transformation ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
This paper presents an analysis technique and a novel program transformation that can enable powerful optimizations for entire linked data structures. The fully automatic transformation converts ordinary programs to use pool (aka region) allocation for heap-based data structures. The transformation relies on an efficient link-time interprocedural analysis to identify disjoint data structures in the program, to check whether these data structures are accessed in a type-safe manner, and to construct a Disjoint Data Structure Graph that describes the connectivity pattern within such structures. We present preliminary experimental results showing that the data structure analysis and pool allocation are effective for a set of pointer intensive programs in the Olden benchmark suite. To illustrate the optimizations that can be enabled by these techniques, we describe a novel pointer compression transformation and briefly discuss several other optimization possibilities for linked data structures.
Dynamic Clustering in an Object-Oriented Distributed System
- In OOPSLA Workshop on Objects in Large Distributed Systems (OLDS-2
, 1987
"... In an O-O large distributed system, object grouping is crucial in order to optimize communications between objects and disk I/O transfers. In this paper, we present a general purpose and scalable object clustering method which is integrated with garbage collection and load balancing processing. We p ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In an O-O large distributed system, object grouping is crucial in order to optimize communications between objects and disk I/O transfers. In this paper, we present a general purpose and scalable object clustering method which is integrated with garbage collection and load balancing processing. We propose a mixed dynamic and programmer-driven approach. 1 Introduction The evolution of distributed applications is characterized by a growing number of nodes and (possibly persistent) objects, due to an increasing number of users and to code reuse. As a result, object clustering is important for performance purpose: to co-locate objects that communicate often, and to optimize disk I/O. Moreover, object clustering does not only improve paging performance but also memory usage, efficiency of garbage collection and load balancing. In addition, we adopt the following goals: to provide transparent, general purpose, multi-language and scalable solutions. We discard application specific solutions ...

