Results 1 - 10
of
14
From patterns to frameworks to parallel programs
- UNIVERSITY OF ALBERTA
, 2002
"... This dissertation shows a new approach to writing object-oriented parallel programs based on design patterns, frameworks, and multiple layers of abstraction. ..."
Abstract
-
Cited by 42 (10 self)
- Add to MetaCart
This dissertation shows a new approach to writing object-oriented parallel programs based on design patterns, frameworks, and multiple layers of abstraction.
Neko: A single environment to simulate and prototype distributed algorithms
- In Proc. of the 15th Int’l Conf. on Information Networking (ICOIN-15
, 2002
"... Designing, tuning, and analyzing the performance of distributed algorithms and protocols are complex tasks. A major factor that contributes to this complexity is the fact that there is no single environment to support all phases of the development of a distributed algorithm. This paper presents *Nek ..."
Abstract
-
Cited by 42 (16 self)
- Add to MetaCart
Designing, tuning, and analyzing the performance of distributed algorithms and protocols are complex tasks. A major factor that contributes to this complexity is the fact that there is no single environment to support all phases of the development of a distributed algorithm. This paper presents *Neko*, an easy to use Java platform that provides a uniform and extensible environment for various phases of algorithm design and performance evaluation: prototyping, tuning, simulation, deployment, etc.
Controlling garbage collection and heap growth to reduce execution time of Java applications
- In ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA’01
, 2001
"... ABSTRACT In systems that support garbage collection, a tension exists between collecting garbage too frequently and not collecting garbage frequently enough. Garbage collection that occurs too frequently may introduce unnecessary overheads at the risk of not collecting much garbage during each cycle ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
ABSTRACT In systems that support garbage collection, a tension exists between collecting garbage too frequently and not collecting garbage frequently enough. Garbage collection that occurs too frequently may introduce unnecessary overheads at the risk of not collecting much garbage during each cycle. On the other hand, collecting garbage too infrequently can result in applications that execute with a large amount of virtual memory (i.e., with a large footprint) and suffer from increased execution times due to paging. In this paper, we use a large collection of JavaTMapplications and the highly tuned and widely used Boehm-DemersWeiser (BDW) conservative mark-and-sweep garbage collector to experimentally examine the extent to which the frequency of garbage collection impacts an application's execution time, footprint, and pause times. We use these results to devise some guidelines for controlling garbage collection and heap growth in a conservative garbage collector in order to minimize application execution times. Then we describe new strategies for controlling garbage collection and heap growth that impact not only the frequency with which garbage collection occurs but also the points at which garbage collection occurs. Experimental results demonstrate that, when compared with the existing approach used in the standard BDW collector, our new strategy can significantly reduce application execution times. Our goal is to obtain a better understanding of how to control garbage collection and heap growth for an individual application executing in isolation. These results can be applied in a number of high-performance computing and server environments, in addition to some single-user environments. This work should also provide insights into how
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
An algorithm for parallel incremental compaction
- In Proceedings of the Third International Symposium on Memory Management
, 2002
"... Garbage collectors of the mark-sweep family may suffer from memory fragmentation and require the use of compaction. Known compaction methods are expensive and work while program activity is stopped, so that compaction is often a major contributor to garbage collection pause times. We present a paral ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Garbage collectors of the mark-sweep family may suffer from memory fragmentation and require the use of compaction. Known compaction methods are expensive and work while program activity is stopped, so that compaction is often a major contributor to garbage collection pause times. We present a parallel incremental compaction algorithm that reduces pause times by working in parallel and evacuating a part of the heap when the program threads are stopped for garbage collection. Our algorithm works with collectors based on mark-sweep, including mostly concurrent collectors. We have implemented a prototype of our algorithm as part of the garbage collector in the IBM JVM. Measurements of our prototype show that even with the most simple-minded policies, e.g., for choosing the area to evacuate, parallel incremental compaction can successfully reduce maximum garbage collection pause times with a minimal performance penalty.
Architecture of the PEVM: a high-performance orthogonally persistent java virtual machine
- the Proc. of the 9th Workshop on Persistent Object Systems (POS9
, 2000
"... This paper outlines the design and implementation of the PEVM, a new scalable, high-performance implementation of orthogonal persistence for the Java platform (OPJ). The PEVM is based on the Sun Microsystems Laboratories Virtual Machine for Research, which features an optimizing Just-In-Time compile ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper outlines the design and implementation of the PEVM, a new scalable, high-performance implementation of orthogonal persistence for the Java platform (OPJ). The PEVM is based on the Sun Microsystems Laboratories Virtual Machine for Research, which features an optimizing Just-In-Time compiler, exact generational garbage collection, and fast thread synchronization. The PEVM also uses a new, scalable persistent object store designed to manage 80GB of objects. It is approximately ten times faster than previous OPJ implementations and can run signi cantly larger programs. Despite its greater speed and scalability, the PEVM's implementation is much simpler (e.g., just 43 % of the VM source patches needed by our previous OPJ implementation). This is largely due to the pointer swizzling strategy we chose, the ResearchVM's exact memory management, and simple but e ective mechanisms. For example, we implement some key data structures in the Java programming language since this automatically makes them persistent.
Fast multiprocessor memory allocation and garbage collection
, 2000
"... ABSTRACT We extended our garbage collecting memory allocator 1 to provide good performance for multi-threaded applications on multiprocessors. The basic design is similar to the approach previously pursued in [12]. However, we concentrate on issues important to more common small-scale multiprocessor ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
ABSTRACT We extended our garbage collecting memory allocator 1 to provide good performance for multi-threaded applications on multiprocessors. The basic design is similar to the approach previously pursued in [12]. However, we concentrate on issues important to more common small-scale multiprocessors, and on specific issues not reported elsewhere. We argue that a reasonable level of garbage collector scalability can be achieved with relatively minor additions to the underlying collector code. Furthermore the scalable collector does not need to be appreciably slower on a uniprocessor. Since our collector can serve as a plug-in replacement for malloc/free, we have the opportunity to compare it to scalable malloc-free implementations, notably Hoard [3]. Somewhat surprisingly, our collector significantly outperforms Hoard in some tests, a property that is mostly shared by the garbage collecting allocator in [ETY97]. We argue that garbage collectors currently require significantly less synchronization than explicit allocators, but that it may be possible to derive significantly faster explicit allocators from this observation. Speedy access to thread-local storage is a significant issue in the design of allocators that must conform to standard calling conventions. We present empirical evidence that at least in the presence of a garbage collector, this can often be accomplished faster in a thread-independent way than through the standard thread library facilities, casting some doubt on the utility of the latter.
A parallel, incremental, mostly concurrent garbage collector for servers
- ACM Transactions on Programming Languages and Systems
"... Multithreaded applications with multi-gigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for “server-oriented ” GC include: ensuring short pause times on a multi-gigabyte heap while minimizing throughput penalty, good scaling on multiprocessor ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Multithreaded applications with multi-gigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for “server-oriented ” GC include: ensuring short pause times on a multi-gigabyte heap while minimizing throughput penalty, good scaling on multiprocessor hardware, and keeping the number of expensive multi-cycle fence instructions required by weak ordering to a minimum. We designed and implemented a collector facing these demands building on the mostly concurrent garbage collector proposed by Boehm et al. Our collector incorporates new ideas into the original collector. We make it parallel and incremental; we employ concurrent low-priority background GC threads to take advantage of processor idle time; we propose novel algorithmic improvements to the basic mostly concurrent algorithm improving its efficiency and shortening its pause times; and finally, we use advanced techniques, such as a low-overhead work packet mechanism to enable full parallelism among the incremental and concurrent collecting threads and ensure load balancing. We compared the new collector to the mature, well-optimized, parallel, stop-the-world marksweep collector already in the IBM JVM. When allowed to run aggressively, using 72 % of the CPU utilization during a short concurrent phase, our collector prototype reduces the maximum pause time from 161ms to 46ms while only losing 11.5 % throughput when running the SPECjbb2000 benchmark on a 600 MB heap on an 8-way PowerPC 1.1 GHz processors. When the collector is limited to a non-intrusive operation using only 29 % of the CPU utilization, the maximum pause time obtained is 79ms and the loss in throughput is 15.4%.
Mining molecular datasets on symmetric multiprocessor systems
- Proceedings of the 2006 IEEE International Conference on Systems, Man and Cybernetics, 2006, IEEE
, 2006
"... Abstract — Although in the last years about a dozen sophisticated algorithms for mining frequent subgraphs have been proposed, it still takes too long to search big databases with 100,000 graphs and more. Even the currently fastest algorithms like gSpan, FFSM, Gaston, or MoFa need hours to complete ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract — Although in the last years about a dozen sophisticated algorithms for mining frequent subgraphs have been proposed, it still takes too long to search big databases with 100,000 graphs and more. Even the currently fastest algorithms like gSpan, FFSM, Gaston, or MoFa need hours to complete their tasks. This paper presents thread-based parallel versions of MoFa [5] and gSpan [26] that achieve speedups up to 11 on a sharedmemory SMP system using 12 processors. We discuss the design space of the parallelization, the results, and the obstacles, that are caused by the irregular search space and by the current state of Java technology. I.
Abstract Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine
"... A multi-tasking virtual machine (MVM) executes multiple programs in isolation, within a single operating system process. The goal of a MVM is to improve startup time, overall system throughput, and performance, by effective reuse and sharing of system resources across programs (tasks). However, mult ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A multi-tasking virtual machine (MVM) executes multiple programs in isolation, within a single operating system process. The goal of a MVM is to improve startup time, overall system throughput, and performance, by effective reuse and sharing of system resources across programs (tasks). However, multitasking also mandates a memory management system capable of offering a guarantee of isolation with respect to garbage collection costs, accounting of memory usage, and timely reclamation of heap resources upon task termination. To this end, we investigate and evaluate, novel task-aware extensions to a state-of-the-art MVM garbage collector (GC). Our task-aware GC exploits the generational garbage collection hypothesis, in the context of multiple tasks, to provide performance isolation by maintaining task-private young generations. Task aware GC facilitates concurrent per-task allocation and promotion, and minimizes synchronization and scanning overhead. In addition, we efficiently track per-task heap usage to enable GC-free reclamation upon task termination. Moreover, we couple these techniques with a light-weight synchronization mechanism that enables pertask minor collection, concurrently with allocation by other tasks. We empirically evaluate the efficiency, scalability, and throughput that our task-aware GC system enables. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—Memory management (garbage collection)

