Results 1  10
of
43
The design and implementation of FFTW3
 PROCEEDINGS OF THE IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 678 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm.
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract

Cited by 569 (4 self)
 Add to MetaCart
(Show Context)
FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's selfoptimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machinespecific, vendoroptimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
FAST FOURIER TRANSFORMS: A TUTORIAL REVIEW AND A STATE OF THE ART
, 1990
"... The publication of the CooleyTukey fast Fourier transform (FIT) algorithm in 1965 has opened a new area in digital signal processing by reducing the order of complexity of some crucial computational tasks like Fourier transform and convolution from N 2 to N log2 N, where N is the problem size. The ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
The publication of the CooleyTukey fast Fourier transform (FIT) algorithm in 1965 has opened a new area in digital signal processing by reducing the order of complexity of some crucial computational tasks like Fourier transform and convolution from N 2 to N log2 N, where N is the problem size. The development of the major algorithms (CooleyTukey and splitradix FFT, prime factor algorithm and Winograd fast Fourier transform) is reviewed. Then, an attempt is made to indicate the state of the art on the subject, showing the standing of research, open problems and implementations.
Superfast solution of real positive definite Toeplitz systems
 SIAM J. Matrix Anal. Appl
, 1988
"... Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the nth Szegő polynomial using fewer than 8n log2 2 n real arithmetic operations without explicit use of the bitreversal permutation. Since Levinson’s algorithm requires slightly more than 2n2 operations to obtain this polynomial, we achieve crossover with Levinson’s algorithm at n = 256. Key words. Toeplitz matrix, Schur’s algorithm, splitradix Fast Fourier Transform
A modified splitradix FFT with fewer arithmetic operations
 IEEE TRANS. SIGNAL PROCESSING
, 2006
"... ..."
Multidigit Multiplication For Mathematicians
, 2001
"... This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStr ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen trick, Schönhage's trick, Nussbaumer's trick, the cyclic SchönhageStrassen trick, and the CantorKaltofen theorem. It emphasizes the underlying ring homomorphisms.
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
right notice and this permission notice are preserved on all copies.
Performance models and search methods for optimal FFT implementations
, 2000
"... This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (de ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (decomposition) that breaks down the initial transform into combinations of different smaller size subtransforms, which are graphically represented as breakdown trees. Recursive application of the rewrite rules generates a set of algorithms and alternative codes for the FFT computation. The set of &quot;all &quot; possible implementations (within the given set of the rules) results in pairing the possible breakdown trees with the code implementation alternatives. To evaluate the quality of these implementations, we develop analytical and experimental performance models. Based on these models, we derive methods dynamic programming, soft decision dynamic programming and exhaustive search to find the implementation with minimal runtime. Our test results demonstrate that good algorithms and codes, accurate performance
An Approach To LowPower, HighPerformance, Fast Fourier Transform Processor Design
"... The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by proce ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by processors to increase as well. This power increase has resulted in a situation where the number of potential FFT applications limited by maximum power budgets  not performance  is significant and growing. We present
Peaktoaverage power ratio reduction of OFDM systems using cross entropy method
 in 17th Int. Conf. on Wireless Commun
, 2005
"... copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author’s prior written permission.