Results 1 - 10
of
13
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract
-
Cited by 372 (4 self)
- Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machine-specific, vendor-optimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
The design and implementation of FFTW3
- Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract
-
Cited by 255 (4 self)
- Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction, multiple-data (SIMD) instructions, and how a special-purpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
Superfast solution of real positive definite Toeplitz systems
- SIAM J. Matrix Anal. Appl
, 1988
"... Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the split-radix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the split-radix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the nth Szegő polynomial using fewer than 8n log2 2 n real arithmetic operations without explicit use of the bit-reversal permutation. Since Levinson’s algorithm requires slightly more than 2n2 operations to obtain this polynomial, we achieve crossover with Levinson’s algorithm at n = 256. Key words. Toeplitz matrix, Schur’s algorithm, split-radix Fast Fourier Transform
Multidigit Multiplication For Mathematicians
"... . This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrass ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
. This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the split-radix FFT trick, Good's trick, the SchonhageStrassen trick, Schonhage's trick, Nussbaumer's trick, the cyclic SchonhageStrassen trick, and the Cantor-Kaltofen theorem. It emphasizes the underlying ring homomorphisms. 1.
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
A Modified Split-Radix FFT With Fewer Arithmetic Operations
, 2007
"... Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a power-of-two discrete Fourier transform (DFT). Here, we present a simple recursive modification of the split-radix algorithm that computes th ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a power-of-two discrete Fourier transform (DFT). Here, we present a simple recursive modification of the split-radix algorithm that computes the DFT with asymptotically about 6 % fewer operations than Yavne, matching the count achieved by Van Buskirk’s program-generation framework. We also discuss the application of our algorithm to real-data and real-symmetric (discrete cosine) transforms, where we are again able to achieve lower arithmetic counts than previously published algorithms.
An Approach To Low-Power, High-Performance, Fast Fourier Transform Processor Design
"... The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by proce ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by processors to increase as well. This power increase has resulted in a situation where the number of potential FFT applications limited by maximum power budgets - not performance - is significant and growing.
Application-Specific Architecture For Fast Transforms Based On The Successive Doubling Method, Part I: A Constant Geometry Approach
, 1994
"... The successive doubling method is an ecient procedure for the design of fast algorithms for orthogonal transforms of length N = r n , where the radix r is a power of 2. It reduces the algorithmic complexity from N 2 to N log r N . In this work we present a partitioned systolic architecture ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The successive doubling method is an ecient procedure for the design of fast algorithms for orthogonal transforms of length N = r n , where the radix r is a power of 2. It reduces the algorithmic complexity from N 2 to N log r N . In this work we present a partitioned systolic architecture for the two standard radix successive doubling algorithms: ascend and descend communication patterns. The systolization and partitioning procedure we have used is made up of three actions. First, we transform the ow chart of the data for the successive doubling algorithm into a new chart of constant geometry in all its stages (n). We obtain the constant geometry by means of the perfect unshue (ascending algorithm) or shuf- e (descending algorithm) permutations of order log 2 r. We then carry out the decomposition of these permutations into elementary permutations, which can be implemented electronically. Finally, we project the index space of the data onto the index space associ...
Constant Geometry Split-Radix Algorithms
- J. VLSI Signal Processing
, 1995
"... The split-radix algorithm (SR) is a highly efficient version of the successive doubling method. Its application to the Fourier transform results in an algorithm that brings together the advantages of the radix 2 and radix 4 algorithms. In this work we present the generalization of the method that le ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The split-radix algorithm (SR) is a highly efficient version of the successive doubling method. Its application to the Fourier transform results in an algorithm that brings together the advantages of the radix 2 and radix 4 algorithms. In this work we present the generalization of the method that leads to the SR algorithm in the FFT and the implementation of a constant geometry (CG) version of it. In particular, we develop a CG algorithm of the successive doubling method that factorizes a sequence of length N into p sequences of length N=r and into (r \Gamma p)r of length N=r 2 (r 2; 0 ! p ! r). After this, the method is generalized for its application to SR r, r 2 , ..., r u algorithms, that is, to those based on the factorization of a sequence of length N into p 1 subsequences of length N=r, p 2 r of length N=r 2 , ...., p u r u\Gamma1 of length N=r u (p 1 + p 2 + \Delta \Delta \Delta + p u = r). The results are applied to the implementation of a pipeline with identical...
The tangent FFT
- in [Boztas and Lu 2007]. URL: http://cr.yp.to/papers.html#tangentfft. ID a9a77cef9a7b77f9b8b305 e276d5fe25. Citations in this document: §2.9
"... Abstract. The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just 4n lg n−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Rece ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just 4n lg n−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk’s software computes a sizen complex DFT using only (34/9 + o(1))n lg n arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the “tangent FFT, ” a straightforward in-place cachefriendly DFT algorithm having exactly the same operation counts as Van Buskirk’s algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk’s improvement but also for minimizing the memory-access costs of the FFT. Keywords: tangent FFT; split-radix FFT; modified split-radix FFT; scaled odd tail; DFT; convolution; polynomial multiplication; algebraic complexity; communication complexity 1

