Results 1 - 10
of
39
Determining Average Program Execution Times and their Variance
, 1989
"... This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based exe ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89], we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times in a program. This approach is based on the interval structure [ASU86] and the control dependence relation [FOW87], both of w...
Compiler-Controlled Multithreading for Lenient Parallel Languages
, 1991
"... Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads re ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses....
Design and Implementation of Code Optimizations for a Type-Directed Compiler for Standard ML
, 1996
"... Abstract The trends in software development are towards larger programs, more complex programs, and more use of programs as "component software. " These trends mean that the features of modern programming languages are becoming more important than ever before. Programming languages need to ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Abstract The trends in software development are towards larger programs, more complex programs, and more use of programs as "component software. " These trends mean that the features of modern programming languages are becoming more important than ever before. Programming languages need to have features such as strong typing, a module system, polymorphism, automatic storage management, and higher-order functions. In short, modern programming languages are becoming more important than ever before.
Vectorized Garbage Collection
- Topics in Advanced language Implementation
, 1990
"... Garbage collection can be done in vector mode on supercomputers like the Cray-2 and the Cyber 205. Both copying collection and mark-and-sweep can be expressed as breadth-first searches in which the "queue" can be processed in parallel. We have designed a copying garbage collector whose inner loop wo ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Garbage collection can be done in vector mode on supercomputers like the Cray-2 and the Cyber 205. Both copying collection and mark-and-sweep can be expressed as breadth-first searches in which the "queue" can be processed in parallel. We have designed a copying garbage collector whose inner loop works entirely in vector mode. We give performance measurements of the algorithm as implemented for Lisp CONS cells on the Cyber 205. Vector-mode garbage collection performs up to 9 times faster than scalar-mode collection --- a worthwhile improvement. - 1. Automatic garbage collection on vector supercomputers Languages like Lisp with dynamic storage allocation and automatic garbage collection are increasingly being used on vector supercomputers. Implementations of Lisp have been done for Cray supercomputers [1], and fully supported supercomputer Lisp environments will soon be available (e.g. Common Lisp provided by Cray Research and Franz, Inc.)[2]. This is a natural development. Languages ...
The Implementation and Evaluation of Fusion and Contraction in Array Languages
, 1998
"... Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying array-based computations and expressing data parallelism. However, they can suffer large performance penalties because they introduce intermediate arrays---both at the source level and during the compilation process--- ..."
Abstract
-
Cited by 38 (9 self)
- Add to MetaCart
Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying array-based computations and expressing data parallelism. However, they can suffer large performance penalties because they introduce intermediate arrays---both at the source level and during the compilation process---which increase memory usage and pollute the cache. Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction. We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages. Experimental results show that our scheme typically yields runtime improvements of greater than 20% and sometimes up to 400%. In addition, it yields superior memory use when compared against commercial compilers and exhibits comparable memory use when compared with scalar languages. We also explore ...
Memory Subsystem Performance of Programs Using Copying Garbage Collection
- IN TWENTY-FIRST ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES
, 1994
"... Heap allocation with copying garbage collection is believed to have poor memory subsystem performance. We conducted a study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We found that many machines support heap allocation poorly. However, with t ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
Heap allocation with copying garbage collection is believed to have poor memory subsystem performance. We conducted a study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We found that many machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good memory subsystem performance.
Memory Subsystem Performance of Programs with Intensive Heap Allocation
- IN 21ST ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES
, 1994
"... Heap allocation with copying garbage collection is a general storage management technique for modern programming languages. It is believed to have poor memory subsystem performance. To investigate this, we conducted an in-depth study of the memory subsystem performance of heap allocation for memory ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Heap allocation with copying garbage collection is a general storage management technique for modern programming languages. It is believed to have poor memory subsystem performance. To investigate this, we conducted an in-depth study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We studied the performance of mostly-functional Standard ML programs which made heavy use of heap allocation. We found that most machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good performance. The memory subsystem property crucial for achieving good performance was the ability to allocate and initialize a new object into the cache without a penalty. This can be achieved by having subblock placement with a subblock size of one word with a write allocate policy, along with fast page-mode writes or a write buffer. For caches with subblock placement, the data cache overhead was under 9% for a 64K of larger data cache; without subblock placement the overhead was often higher than 50%.
Callee-save Registers in Continuation-passing Style
, 1992
"... Continuation-passing style (CPS) is a good abstract representation to use for compilation and optimization: it has a clean semantics and is easily manipulated. We examine how CPS expresses the saving and restoring of registers in source-language procedure calls. In most CPS-based compilers, the cont ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
Continuation-passing style (CPS) is a good abstract representation to use for compilation and optimization: it has a clean semantics and is easily manipulated. We examine how CPS expresses the saving and restoring of registers in source-language procedure calls. In most CPS-based compilers, the context of the calling procedure is saved in a "continuation closure" a single variable that is passed as an argument to the function being called. This closure is a record containing bindings of all the free variables of the continuation; that is, registers that hold values needed by the caller "after the call" are written to memory in the closure, and fetched back after the call.
A compiler controlled Threaded Abstract Machine
- Journal of Parallel and Distributed Computing
, 1993
"... ..."
Memory System Performance of Programs with Intensive Heap Allocation
- ACM Transactions on Computer Systems
, 1995
"... Heap allocation with copying garbage collection 1s a general storage management technique for programming languages. It is believed to have poor memory system performance, To investigate this, we conducted an in-depth study of the memory system performance of heap allocation for memory systems found ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Heap allocation with copying garbage collection 1s a general storage management technique for programming languages. It is believed to have poor memory system performance, To investigate this, we conducted an in-depth study of the memory system performance of heap allocation for memory systems found on many machines. We studied the performance of mostly functional Standard ML programs which made heavy use of heap allocation. We found that most machmes support heap allocation poorly. However, with the appropriate memory system organization, heap allocation can have good performance. The memory system property crucial for achieving good performance was the ability to allocate and initialize a new object into the cache without a penalty. This can be achieved by having subblock placement with a subblock size of one word with a write-allocate pohcy, along with fast page-mode writes or a write buffer. For caches with subblock placement, the data cache overhead was under 9T0 for a 64K or larger data cache; without subblock placement the overhead was often higher than 50’?o.

