Results 1  10
of
31
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract

Cited by 461 (4 self)
 Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's selfoptimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machinespecific, vendoroptimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 414 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
Superfast solution of real positive definite Toeplitz systems
 SIAM J. Matrix Anal. Appl
, 1988
"... Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the ..."
Abstract

Cited by 55 (1 self)
 Add to MetaCart
Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the nth Szegő polynomial using fewer than 8n log2 2 n real arithmetic operations without explicit use of the bitreversal permutation. Since Levinson’s algorithm requires slightly more than 2n2 operations to obtain this polynomial, we achieve crossover with Levinson’s algorithm at n = 256. Key words. Toeplitz matrix, Schur’s algorithm, splitradix Fast Fourier Transform
Multidigit Multiplication For Mathematicians
, 2001
"... This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStr ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen trick, Schönhage's trick, Nussbaumer's trick, the cyclic SchönhageStrassen trick, and the CantorKaltofen theorem. It emphasizes the underlying ring homomorphisms.
A modified splitradix FFT with fewer arithmetic operations
 IEEE Trans. Signal Processing
, 2007
"... ..."
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
right notice and this permission notice are preserved on all copies.
Performance models and search methods for optimal FFT implementations
, 2000
"... This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (de ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (decomposition) that breaks down the initial transform into combinations of different smaller size subtransforms, which are graphically represented as breakdown trees. Recursive application of the rewrite rules generates a set of algorithms and alternative codes for the FFT computation. The set of &quot;all &quot; possible implementations (within the given set of the rules) results in pairing the possible breakdown trees with the code implementation alternatives. To evaluate the quality of these implementations, we develop analytical and experimental performance models. Based on these models, we derive methods dynamic programming, soft decision dynamic programming and exhaustive search to find the implementation with minimal runtime. Our test results demonstrate that good algorithms and codes, accurate performance
An Approach To LowPower, HighPerformance, Fast Fourier Transform Processor Design
"... The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by proce ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The Fast Fourier Transform (FFT) is one of the most widely used digital signal processing algorithms. While advances in semiconductor processing technology have enabled the performance and integration of FFT processors to increase steadily, these advances have also caused the power consumed by processors to increase as well. This power increase has resulted in a situation where the number of potential FFT applications limited by maximum power budgets  not performance  is significant and growing. We present
Application of Mixedradix FFT Algorithms in Multiband GNSS Signal Acquisition Engines
 Journal of Global Positioning Systems
, 2009
"... Abstract. Due to their fast operation, Fast Fourier Transform (FFT)based coarse signal synchronization methods are an attractive option for Global Navigation Satellite System (GNSS) receiver baseband signal processing. However, there are several reasons why the utility of FFTbased methods is depen ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Due to their fast operation, Fast Fourier Transform (FFT)based coarse signal synchronization methods are an attractive option for Global Navigation Satellite System (GNSS) receiver baseband signal processing. However, there are several reasons why the utility of FFTbased methods is dependent on understanding the tradeoff between synchronization speed and the required processing power. Firstly, the new signals of the GNSS family, for instance Galileo and GPS modernization, employ longer period Pseudo Random Noise (PRN) codes and higher signal bandwidths, which demand FFTs of large transform lengths. Secondly, to gain an advantage in positioning performance, next generation receivers target multiple GNSS signals, and since each signal has its own code length (and hence a minimum sampling frequency), the receiver should accommodate FFT blocks of varying lengths. This paper discusses the requirements of FFTbased algorithms for such a multiband receiver and analyzes the application of primefactor and mixedradix FFT algorithms. A novel way of factorizing different transform lengths into smaller transforms and then combining these smallerpoint FFTs to compute the larger required FFTs is described. It is shown that the use of the proposed architecture reduces the computational load (or processor cycles) and increases the reusability of the acquisition search engine to process different signals.