Results 1 - 10
of
98
Checking system rules using system-specific, programmer-written compiler extensions
, 2000
"... ..."
Supporting Dynamic Data Structures on Distributed-Memory Machines
, 1995
"... this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy ..."
Abstract
-
Cited by 143 (8 self)
- Add to MetaCart
this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy task creation. We intend to exploit this execution model using compiler analyses and automatic parallelization techniques. We have implemented a prototype system, which we call Olden, that runs on the Intel iPSC/860 and the Thinking Machines CM-5. We discuss our implementation and report on experiments with five benchmarks.
Memory Management with Explicit Regions
, 1998
"... Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allo ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult. 1 Introduction The two most popular memory management techniques are explicit allocation and deallocation, as in C's malloc/free, and various forms of garbagecollection [Wil92]. Both have well-known advantages and disadvantages, discussed further below. A t...
ISDL: An Instruction Set Description Language for Retargetability
, 1997
"... We present the Instruction Set Description Language, ISDL, a machine description language used to describe target architectures to a retargetable compiler. A retargetable compiler is capable of compiling application code into machine code for different processors. The features and flexibility of ISD ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
We present the Instruction Set Description Language, ISDL, a machine description language used to describe target architectures to a retargetable compiler. A retargetable compiler is capable of compiling application code into machine code for different processors. The features and flexibility of ISDL enable the description of vastly different architectures such as an ASIP VLIW processor and a commercial DSP microprocessor. For instance, unlike other machine description languages, ISDL explicitly supports constraints which define valid operation groupings within an instruction, increasing the range of specifiable architectures. We have written a tool which, given an ISDL description of a processor, can automatically generate an assembler for it. Ongoing work includes the development of an automatic code-generator generator. ISDL: An Instruction Set Description Language for Retargetability 2 DSP Core Program ROM RAM ASIC or ASIP Peripherals Figure 1: A heterogeneous system-on-a-chip 1...
Language Support for Regions
- In Programming Language Design and Implementation (PLDI
, 2001
"... Region-based memory management systems structure memory by grouping objects in regions under program control. Memory is reclaimed by deleting regions, freeing all objects stored therein. Our compiler for C with regions, RC, prevents unsafe region deletions by keeping a count of references to each re ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
Region-based memory management systems structure memory by grouping objects in regions under program control. Memory is reclaimed by deleting regions, freeing all objects stored therein. Our compiler for C with regions, RC, prevents unsafe region deletions by keeping a count of references to each region. Using type annotations that make the structure of a program's regions more explicit, we reduce the overhead of reference counting from a maximum of 27% to a maximum of 11% on a suite of realistic benchmarks. We generalise these annotations in a region type system whose main novelty is the use of existentially quantified abstract regions to represent pointers to objects whose region is partially or totally unknown. A distribution of RC is available at http://www.cs.berkeley.edu/~dgay/rc.tar.gz.
Software Caching and Computation Migration in Olden
, 1995
"... The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migratio ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migration can improve the performance of these programs, and provide a compile-time heuristic that selects between them for each pointer dereference. We have implemented a prototype system on the Thinking Machines CM-5. We describe our implementation and report on experiments with ten benchmarks.
Compiler Techniques for Code Compaction
, 2000
"... This article explores the use of compiler techniques to accomplish code compaction to yield smaller executables. The main contribution of this article is to show that careful, aggressive, interprocedural optimization, together with procedural abstraction of repeated code fragments, can yield signifi ..."
Abstract
-
Cited by 83 (17 self)
- Add to MetaCart
This article explores the use of compiler techniques to accomplish code compaction to yield smaller executables. The main contribution of this article is to show that careful, aggressive, interprocedural optimization, together with procedural abstraction of repeated code fragments, can yield significantly better reductions in code size than previous approaches, which have generally focused on abstraction of repeated instruction sequences. We also show how "equivalent" code fragments can be detected and factored out using conventional compiler techniques, and without having to resort to purely linear treatments of code sequences as in suffix-tree-based approaches, thereby setting up a framework for code compaction that can be more exible in its treatment of what code fragments are considered equivalent. Our ideas have been implemented in the form of a binary-rewriting tool that reduces the size of executables by about 30% on the average.
Code compression
- In Proc. Conf. on Programming Languages Design and Implementation
, 1997
"... Current research in compiler optimization counts mainly CPU time and perhaps the first cache level or two. This view has been important but is becoming myopic, at least from a system-wide viewpoint, as the ratio of network and disk speeds to CPU speeds grows exponentially. For example, we have seen ..."
Abstract
-
Cited by 80 (11 self)
- Add to MetaCart
Current research in compiler optimization counts mainly CPU time and perhaps the first cache level or two. This view has been important but is becoming myopic, at least from a system-wide viewpoint, as the ratio of network and disk speeds to CPU speeds grows exponentially. For example, we have seen the CPU idle for most of the time during paging, so compressing pages can increase total performance even though the CPU must decompress or interpret the page contents. Another profile shows that many functions are called just once, so reduced paging could pay for their interpretation overhead. This paper describes:. Measurements that show how code compression can save space and total time in some important real-world scenarios.. A compressed executable representation that is roughly the same size as gzipped x86 programs and can be interpreted without decompression. It can also be compiled to high-quality machine code at 2.5 megabytes per second on a 120MHz Pentium processor l A compressed “wire ” representation that must be decompressed before execution but is, for example, roughly 21 % the size of SPARC code when compressing gee.
A Real-Time Procedural Shading System for Programmable Graphics Hardware
, 2001
"... Real-time graphics hardware is becoming programmable, but this programmable hardware is complex and difficult to use given current APIs. Higher-level abstractions would both increase programmer productivity and make programs more portable. However, it is challenging to raise the abstraction level wh ..."
Abstract
-
Cited by 75 (8 self)
- Add to MetaCart
Real-time graphics hardware is becoming programmable, but this programmable hardware is complex and difficult to use given current APIs. Higher-level abstractions would both increase programmer productivity and make programs more portable. However, it is challenging to raise the abstraction level while still providing high performance. We have developed a real-time procedural shading language system designed to achieve this goal. Our system is organized around multiple computation frequencies. For example, computations may be associated with vertices or with fragments/pixels. Our system’s shading language provides a unified interface that allows a single procedure to include operations from more than one computation frequency. Internally, our system virtualizes limited hardware resources to allow for arbitrarily-complex computations. We map operations to graphics hardware if possible, or to the host CPU as a last resort. This mapping is performed by compiler back-end modules associated with each computation frequency. Our system can map vertex operations to either programmable vertex hardware or to the host CPU, and can map fragment operations to either programmable fragment hardware or to multipass OpenGL. By carefully designing all the components of the system, we are able to generate highly-optimized code. We demonstrate our system running in real-time on a variety of hardware.

