Results 1 - 10
of
56
The design and implementation of FFTW3
- Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract
-
Cited by 255 (4 self)
- Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction, multiple-data (SIMD) instructions, and how a special-purpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
The Landscape of Parallel Computing Research: A View from Berkeley
- TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
A language for shading and lighting calculations
- Computer Graphics (SIGGRAPH ’90 Proceedings
, 1990
"... A shading language provides a means to extend the shading and lighting formulae used by a rendering system. This paper discusses the design of a new shading language based on previous work of Cook and Perlin. This language has various types of shaders for light sources and surface reflectances, poin ..."
Abstract
-
Cited by 101 (6 self)
- Add to MetaCart
A shading language provides a means to extend the shading and lighting formulae used by a rendering system. This paper discusses the design of a new shading language based on previous work of Cook and Perlin. This language has various types of shaders for light sources and surface reflectances, point and color data types, control flow constructs that support the casting of outgoing and the integration of incident light, a clearly specified interface to the rendering system using global state variables, and a host of useful built-in functions. The design issues and their impact on the implementation are also discussed. CR Categories: 1.3.3 [Computer Graphics] Picture/Image Generation- Display algorithms; 1.3.5 [Computer Graphics]
Enhanced Code Compression for Embedded RISC Processors
, 1999
"... This paper explores compiler techniques for reducing the memory needed to load and run program executables. In embedded systems, where economic incentives to reduce both ram and rom are strong, the size of compiled code is increasingly important. Similarly, in mobile and network computing, the need ..."
Abstract
-
Cited by 89 (2 self)
- Add to MetaCart
This paper explores compiler techniques for reducing the memory needed to load and run program executables. In embedded systems, where economic incentives to reduce both ram and rom are strong, the size of compiled code is increasingly important. Similarly, in mobile and network computing, the need to transmit an executable before running it places a premium on code size. Our work focuses on reducing the size of a program's code segment, using pattern-matching techniques to identify and coalesce together repeated instruction sequences. In contrast to other methods, our framework preserves the ability to run program executables directly, without an intervening decompression stage. Our compression framework is integrated into an industrial-strength optimizing compiler, which allows us to explore the interaction between code compression and classical code optimization techniques, and requires that we contend with the difficulties of compressing previously optimized code. The specific contributions in this paper include a comprehensive experimental evaluation of code compression for a Risc-like architecture, a more powerful pattern-matching scheme for improved identification of repeated code fragments, and a new form of profile-driven code compression that reduces the speed penalty arising from compression.
Adaptive Optimizing Compilers for the 21st Century
- Journal of Supercomputing
, 2001
"... Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence ..."
Abstract
-
Cited by 70 (8 self)
- Add to MetaCart
Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence that minimizes an explicit, external objective function. The result is a compiler framework that adapts its behavior to the application being compiled, to the pool of available transformations, to the objective function, and to the target machine.
Finding effective optimization phase sequences
- In Proceedings of the 2003 ACM SIGPLAN conference on Language, Compiler, and Tool for Embedded Systems
, 2003
"... It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumptio ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumption. Given that many embedded application developers are willing to spend time tuning an application, we believe a viable approach is to allow the developer to steer the process of optimizing a function. In this paper, we describe support in VISTA, an interactive compilation system, for finding effective sequences of optimization phases. VISTA provides the user with dynamic and static performance information that can be used during an interactive compilation session to gauge the progress of improving the code. In addition, VISTA provides support for automatically using performance information to select the best optimization sequence among several attempted. One such feature is the use of a genetic algorithm to search for the most efficient sequence based on specified fitness criteria. We hav e included a number of experimental results that evaluate the effectiveness of using a genetic algorithm in VISTA to find effective optimization phase sequences.
Eliminating Branches using a Superoptimizer and the GNU C Compiler
, 1992
"... this paper uses the RS/6000 for all its examples, the techniques described here are applicable to most machines ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
this paper uses the RS/6000 for all its examples, the techniques described here are applicable to most machines
Executing Formal Specifications need not be Harmful
- Software Engineering Journal
, 1996
"... We review the various arguments which have been advanced for and against the use of executable specifications. Examples are given of the problems which may arise in applying this technique and of the benefits which may accrue. A case study is reported in which execution is used to validate the p ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
We review the various arguments which have been advanced for and against the use of executable specifications. Examples are given of the problems which may arise in applying this technique and of the benefits which may accrue. A case study is reported in which execution is used to validate the published specification of a commercially available package. We conclude that there are circumstances when executable specifications can be of high value but that execution must be used together with, and as a supplement to, other methods of validating specifications such as inspection and proof. 1 Introduction Formal specifications have been accepted as having value in a number of areas, including critical systems. A specification that does not correctly capture requirements, however, is of dubious benefit. Validating a specification, whether formal or informal, is known to be difficult. With a formal specification there are a number of techniques available for validation, including r...
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...

