Results 1 - 10
of
18
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract
-
Cited by 372 (4 self)
- Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machine-specific, vendor-optimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract
-
Cited by 129 (5 self)
- Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
SPIRAL: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
- Journal of High Performance Computing and Applications
, 2004
"... SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical fra ..."
Abstract
-
Cited by 62 (18 self)
- Add to MetaCart
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations
The Fastest Fourier Transform in the West
- the Proceedings of the 1998 International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98
, 1997
"... This paper describes FFTW, a portable C package for computing the one- and multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFT software, including the well-known FFTPACK and the code from Numerical Recipes. More interestingly, FFT ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper describes FFTW, a portable C package for computing the one- and multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFT software, including the well-known FFTPACK and the code from Numerical Recipes. More interestingly, FFTW is competitive with or better than proprietary, highly-tuned codes such as Sun's Performance Library and IBM's ESSL library. FFTW implements the Cooley-Tukey fast Fourier transform, and is freely available on the Web at http://theory.lcs.mit.edu/fftw. Three main ideas are the keys to FFTW's performance. First, the computation of the transform is performed by an executor consisting of highly-optimized, composable blocks of C code called codelets. Second, at runtime, a planner finds an efficient way (called a `plan') to compose the codelets. Through the planner, FFTW adapts itself to the architecture of the machine it is running on. Third, the codelets are automatically generated by a code...
Multidigit Multiplication For Mathematicians
"... . This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrass ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
. This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrassen trick, Schonhage's trick, Nussbaumer's trick, the cyclic SchonhageStrassen trick, and the Cantor-Kaltofen theorem. It emphasizes the underlying ring homomorphisms. 1.
Fast Automatic Generation of DSP Algorithms
, 2001
"... SPIRAL is a generator of optimized, platform-adapted libraries for digital signal processing algorithms. SPIRAL's strategy translates the implementation task into a search in an expanded space of alternatives. ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
SPIRAL is a generator of optimized, platform-adapted libraries for digital signal processing algorithms. SPIRAL's strategy translates the implementation task into a search in an expanded space of alternatives.
Stochastic Search for Signal Processing Algorithm Optimization
, 2001
"... This paper presents an evolutionary algorithm for searching for the optimal implementations of signal transforms and compares this approach against other search techniques. A single signal processing algorithm can be represented by a very large number of different but mathematically equivalent ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
This paper presents an evolutionary algorithm for searching for the optimal implementations of signal transforms and compares this approach against other search techniques. A single signal processing algorithm can be represented by a very large number of different but mathematically equivalent formulas. When these formulas are implemented in actual code, unfortunately their running times differ significantly. Signal processing algorithm optimization aims at finding the fastest formula.
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
Optimizing Sorting with Genetic Algorithms
- In The International Symposium on Code Generation and Optimization
, 2005
"... 1 ..."
Automatic Generation of Prime Length FFT Programs
- IEEE Transactions on Signal Processing
, 1996
"... We describe a set of programs for circular convolution and prime length FFTs that are relatively short, possess great structure, share many computational procedures, and cover a large variety of lengths. The programs make clear the structure of the algorithms and clearly enumerate independent comput ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We describe a set of programs for circular convolution and prime length FFTs that are relatively short, possess great structure, share many computational procedures, and cover a large variety of lengths. The programs make clear the structure of the algorithms and clearly enumerate independent computational branches that can be performed in parallel. Moreover, each of these independent operations is made up of a sequence of sub-operations which can be implemented as vector/parallel operations. This is in contrast with previously existing programs for prime length FFTs: they consist of straight line code, no code is shared between them, and they can not be easily adapted for vector/parallel implementations. We have also developed a program that automatically generates these programs for prime length FFTs. This code generating program requires information only about a set of modules for computing cyclotomic convolutions. Contact Address: Ivan W. Selesnick Electrical and Computer Engineer...

