Results 1  10
of
183
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 396 (6 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
Automated empirical optimizations of software and the ATLAS project
 Parallel Computing
, 2001
"... This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software ..."
Abstract

Cited by 290 (35 self)
 Add to MetaCart
This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software �AEOS); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moore's Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the basic linear algebra subprograms �BLAS), a widely used, performancecritical,
The Landscape of Parallel Computing Research: A View from Berkeley
 TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
OSKI: A library of automatically tuned sparse matrix kernels
 Institute of Physics Publishing
, 2005
"... kernels ..."
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
, 2000
"... Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary drama ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary dramatically from one processor to the next. In this paper, we therefore address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies based on genetic algorithms, random sampling and simulated annealing. We compare the levels of optimization obtained by iterative compilation to several wellknown static techniques and show that we outperform each of them on a range of benchmarks across a variety of ar...
SPL: A Language and Compiler for DSP Algorithms
, 2001
"... We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into ecient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing Language. The compiler is a component of the SPI ..."
Abstract

Cited by 82 (11 self)
 Add to MetaCart
We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into ecient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing Language. The compiler is a component of the SPIRAL system which makes use of formula transformations and intelligent search strategies to automatically generate optimized digital signal processing (DSP) libraries. After a discussion of the translation and optimization techniques implemented in the compiler, we use SPL formulations of the fast Fourier transform (FFT) to evaluate the compiler. Our results show that SPIRAL, which can be used to implement many classes of algorithms, produces programs that perform as well as \hardwired" systems like FFTW.
Self adapting linear algebra algorithms and software
, 2004
"... One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned highperformance kernels, and for automatically choosing suitable algorithms. S ..."
Abstract

Cited by 81 (21 self)
 Add to MetaCart
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned highperformance kernels, and for automatically choosing suitable algorithms. Specifically, we describe the generation of dense and sparse blas kernels, and the selection of linear solver algorithms. However, the ideas presented here extend beyond these areas, which can be considered proof of concept.
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by reordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (25% of total running time) and high performance benefits (reducing execution time by factors of 1.12.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursionbased control structures may be needed to fully exploit their potential.
SPIRAL: A Generator for PlatformAdapted Libraries of Signal Processing Algorithms
 Journal of High Performance Computing and Applications
, 2004
"... SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be reoptimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical fra ..."
Abstract

Cited by 71 (20 self)
 Add to MetaCart
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be reoptimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations
Active Libraries: Rethinking the roles of compilers and libraries
 In Proceedings of the SIAM Workshop on Object Oriented Methods for Interoperable Scientific and Engineering Computing (OO’98
, 1998
"... We describe Active Libraries, which take an active role in compilation. Unlike traditional libraries which are passive collections of functions and objects, Active Libraries may generate components, specialize algorithms, optimize code, configure and tune themselves for a target machine, and describ ..."
Abstract

Cited by 69 (2 self)
 Add to MetaCart
We describe Active Libraries, which take an active role in compilation. Unlike traditional libraries which are passive collections of functions and objects, Active Libraries may generate components, specialize algorithms, optimize code, configure and tune themselves for a target machine, and describe themselves to tools (such as profilers and debuggers) in an intelligible way. Several such libraries are described, as are implementation technologies. 1 Introduction This paper attempts to document a trend toward libraries which take an active role in generating code and interacting with programming tools. We call these Active Libraries. They solve the problem of how to provide efficient domainspecific abstractions (Section 1.1): active libraries are able to define abstractions, and also control how they are optimized. Several existing libraries for arrays, parallel physics, linear algebra and Fast Fourier Transforms fit the description of active libraries (Section 2). These libraries ...