Results 1 - 10
of
99
Improvements to Graph Coloring Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1994
"... This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the regis ..."
Abstract
-
Cited by 158 (8 self)
- Add to MetaCart
This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the register allocator on a suite of FORTRAN programs. It discusses several insights that we discovered only after repeated implementation of these allocators. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors---compi l ers , optimization General terms: Languages Additional Key Words and Phrases: Register allocation, code generation, graph coloring 1. INTRODUCTION The relationship between run-time performance and e#ective use of a machine's register set is well understood. In a compiler, the process of deciding which values to keep in registers at each point in the generated code is called register allocation. Value
Improving the Ratio of Memory Operations to Floating-Point Operations in Loops
- ACM Transactions on Programming Languages and Systems
, 1994
"... this paper we attempt to answer that question. To do so, we develop and evaluate techniques that automatically restructure program loops to achieve high performance on specific target architectures. These methods attempt to balance computation and memory accesses and seek to eliminate or reduce pipe ..."
Abstract
-
Cited by 91 (16 self)
- Add to MetaCart
this paper we attempt to answer that question. To do so, we develop and evaluate techniques that automatically restructure program loops to achieve high performance on specific target architectures. These methods attempt to balance computation and memory accesses and seek to eliminate or reduce pipeline interlock. To do this, they statically estimate the balance between memory operations and floating-point operations for each loop in a particular program and use these estimates to determine whether to apply various loop transformations. Experiments with our automatic techniques show that integer-factor speedups are possible on kernels. Additionally, the estimate of the balance between memory operations and computation, and the application of the estimate are very accurate---experiments reveal little difference between the balance achieved by our automatic system and that possible by hand optimization. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors---Compilers ;
Putting Pointer Analysis to Work
, 1998
"... This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, points-to analysis and connection analysis, to work. We demonstrate that the fundamen ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, points-to analysis and connection analysis, to work. We demonstrate that the fundamental problem is that one must be able to compare the memory locations read/written via pointer indirections, at different program points, and one must also be able to summarize the effect of pointer references over regions in the program. It is straightforward to compute read/write sets for indirections involving stack-directed pointers using points-to information. However, for heap-directed pointers we show that one needs to introduce the notion of anchor handles into the connection analysis and then express read/write sets to the heap with respect to these anchor handles. Based on the read/write sets we show how to extend traditional optimizations like common subexpression elimination, loop...
Interactive Multi-Pass Programmable Shading
"... Programmable shading is a common technique for production animation, but interactive programmable shading is not yet widely available. We support interactive programmable shading on virtually any 3D graphics hardware using a scene graph library on top of OpenGL. We treat the OpenGL architecture as a ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
Programmable shading is a common technique for production animation, but interactive programmable shading is not yet widely available. We support interactive programmable shading on virtually any 3D graphics hardware using a scene graph library on top of OpenGL. We treat the OpenGL architecture as a general SIMD computer, and translate the high-level shading description into OpenGL rendering passes. While our system uses OpenGL, the techniques described are applicable to any retained mode interface with appropriate extension mechanisms and hardware API with provisions for recirculating data through the graphics pipeline. We present two demonstrations of the method. The first is a constrained shading language that runs on graphics hardware supporting OpenGL 1.2 with a subset of the ARB imaging extensions. We remove the shading language constraints by minimally extending OpenGL. The key extensions are color range (supporting extended range and precision data types) and pixel texture (using framebuffer values as indices into texture maps). Our second demonstration is a renderer supporting the RenderMan Interface and RenderMan Shading Language on a software implementation of this extended OpenGL. For both languages, our compiler technology can take advantage of extensions and performance characteristics unique to any particular graphics hardware.
A Shading Language on Graphics Hardware: The PixelFlow Shading System
, 1998
"... Over the years, there have been two main branches of computer graphics image-synthesis research; one focused on interactivity, the other on image quality. Procedural shading is a powerful tool, commonly used for creating high-quality images and production animation. A key aspect of most procedural s ..."
Abstract
-
Cited by 73 (10 self)
- Add to MetaCart
Over the years, there have been two main branches of computer graphics image-synthesis research; one focused on interactivity, the other on image quality. Procedural shading is a powerful tool, commonly used for creating high-quality images and production animation. A key aspect of most procedural shading is the use of a shading language, which allows a high-level description of the color and shading of each surface. However, shading languages have been beyond the capabilities of the interactive graphics hardware community. We have created a parallel graphics multicomputer, PixelFlow, that can render images at 30 frames per second using a shading language. This is the first system to be able to support a shading language in real-time. In this paper, we describe some of the techniques that make this possible. CR Categories and Subject Descriptors: D.3.2 [Language Classifications] Specialized Application Languages; I.3.1 [Computer Graphics] Hardware Architecture; I.3.3 [Computer Graphic...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a side-effect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing run-time speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parse-time peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
Memory-System Design Considerations For Dynamically-Scheduled Microprocessors
, 1997
"... Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs that balance hardware complexity and compiler technology against performance targets. This dissertation presents a first thorough look at some of the issues introduced by this hardware complexity. The focus of the investigation of these issues is the register file and the other components of the data memory system. These components are: the lockup-free data cache, the stream buffers, and the interface to the lower levels of the memory system. The investigation is based on software models. These models incorporate the features of a dynamically-scheduled processor that affect the design of the data-memory components. The models represent a balance between accuracy and generality, and ar...
Marmot: An Optimizing Compiler for Java
, 1998
"... The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native-code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well-known representation, optimization, code generat ..."
Abstract
-
Cited by 63 (6 self)
- Add to MetaCart
The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native-code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well-known representation, optimization, code generation, and runtime techniques with a few Java-specific features to achieve competitive performance. This paper contains a description of the Marmot system design, along with highlights of our experience applying and adapting traditional implementation techniques to Java. A detailed performance evaluation assesses both Marmot's overall performance relative to other Java and C++ implementations and the relative costs of various Java language features in Marmot-compiled code. Our experience with Marmot has demonstrated that well-known compilation techniques can produce very good performance for static Java applications---comparable or superior to other Java systems, and approaching that o...
Practical Improvements to the Construction and Destruction of Static Single Assignment Form
, 1998
"... Static single assignment (SSA) form is a program representation becoming increasingly popular for compiler-based code optimization. In this paper, we address three problems that have arisen in our use of SSA form. Two are variations to the SSA construction algorithms presented by Cytron et al. The f ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Static single assignment (SSA) form is a program representation becoming increasingly popular for compiler-based code optimization. In this paper, we address three problems that have arisen in our use of SSA form. Two are variations to the SSA construction algorithms presented by Cytron et al. The first variation is a version of...
Software pipelining showdown: Optimal vs. heuristic methods in a production compiler
- In Proc. of the ACM SIGPLAN'96 Conf. on Programming Languages Design and Implementation
, 1996
"... This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneere ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite dissimilar. One technique was developed at Silicon Graphics and is used in the MIPSpro compiler. This is the production compiler for SG1’S systems which are based on the MIPS R8000 processor [Hsu94]. It is essentially a branch-and-bound enumeration of possible sched-ules with extensive pruning. This method is heuristic becaus(s of the way it prunes and also because of the interaction between reg-ister allocation and scheduling. The second technique aims to produce optimal results by formulat-

