Results 1 - 10
of
40
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract
-
Cited by 372 (4 self)
- Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machine-specific, vendor-optimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
The design and implementation of FFTW3
- Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract
-
Cited by 255 (4 self)
- Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction, multiple-data (SIMD) instructions, and how a special-purpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract
-
Cited by 129 (5 self)
- Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Generalized FFTs -- A Survey Of Some Recent Results
, 1995
"... In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of either a function defined on a finite abelian group, or a bandlimited function on a compact abelian group. We discuss generalizations of the FFT to arbitrary finite groups and compact Lie groups.
The Fractional Fourier Transform and Applications
, 1995
"... This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transf ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transform is based on fractional roots of unity e \Gamma2ßiff , where ff is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with non-integer periodicities, performing high-resolution trigonometric interpolation, detecting lines in noisy images and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques. Bailey is with the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center, Moffett Field,...
Multidigit Multiplication For Mathematicians
"... . This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrass ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
. This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrassen trick, Schonhage's trick, Nussbaumer's trick, the cyclic SchonhageStrassen trick, and the Cantor-Kaltofen theorem. It emphasizes the underlying ring homomorphisms. 1.
Space-variant Fourier Analysis: the Exponential Chirp Transform
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Space-variant, or foveating, vision architectures are of importance in both machine and biological vision. In this paper we focus on a particular space-variant map, the log-polar map, which approximates the primate visual map and which has been applied in machine vision by a number of investigators ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Space-variant, or foveating, vision architectures are of importance in both machine and biological vision. In this paper we focus on a particular space-variant map, the log-polar map, which approximates the primate visual map and which has been applied in machine vision by a number of investigators during the past two decades. Associated with the log-polar map, we define a new linear integral transform, which we call the exponential chirp transform. This transform provides frequency domain image processing for spacevariant image formats, while preserving the major aspects of the shift-invariant properties of the usual Fourier transform. We then show that a log-polar coordinate transform in frequency (similar to the MellinTransform) provides a fast exponential chirp transform. This provides size and rotation, in addition to shift, invariant properties in the transformed space. Finally, we demonstrate the use of the fast exponential chirp algorithm on a data-base of images in a template matching task, and also demonstrate its uses for spatial filtering. Given the general lack of algorithms in space-variant image processing, we expect that the fast exponential chirp transform will provide a fundamental tool for applications in this area.
An Adaptive Software Library for Fast Fourier Transforms
- In Proceedings of the International Conference on Supercomputing
, 2000
"... In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at run-time by selec ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at run-time by selecting the fastest strategy among all possible strategies, given available codelets, for a given transform size. We also presentanefficient automatic method of generating the library modules by using a special--purpose compiler. The code generator is written in C and it generates a library of C codelets. The code generator is shown to be flexible and extensible and the entire library can be generated in a matter of seconds. Wehaveevaluated the library for performance on the IBM--SP2, SGI--2000, HP--Exemplar and Intel Pentium systems. We use the results from these evaluations to build performance models for the FFT library on different platforms. The library is shown to be portable, adaptive and efficient. 1.
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
Automatic Generation of Fast Discrete Signal Transforms
, 2001
"... This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the mat ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the matrix representations of these transforms. Unlike previous methods, the algorithm is entirely automatic and uses the defining matrix as its sole input. The sparse matrix factorization algorithm consists of two steps: First, the "symmetry" of the matrix is computed in the form of a pair of group representations; second, the representations are stepwise decomposed, giving rise to a sparse factorization of the original transform matrix. We have successfully demonstrated the method by computing automatically efficient transforms in several important cases: For the DFT, we obtain the Cooley--Tukey FFT; for a class of transforms including the DCT, type II, the number of arithmetic operations for our fast transforms is the same as for the best-known algorithms. Our approach provides new insights and interpretations for the structure of these signal transforms and the question of why fast algorithms exist. The sparse matrix factorization algorithm is implemented within the software package AREP.

