Results 1  10
of
50
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract

Cited by 444 (4 self)
 Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's selfoptimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machinespecific, vendoroptimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 393 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 151 (5 self)
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Generalized FFTs  A Survey Of Some Recent Results
, 1995
"... In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of either a function defined on a finite abelian group, or a bandlimited function on a compact abelian group. We discuss generalizations of the FFT to arbitrary finite groups and compact Lie groups.
The Fractional Fourier Transform and Applications
, 1995
"... This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transf ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transform is based on fractional roots of unity e \Gamma2ßiff , where ff is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with noninteger periodicities, performing highresolution trigonometric interpolation, detecting lines in noisy images and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques. Bailey is with the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center, Moffett Field,...
Explicit Lie–Poisson Integration and the Euler Equations
"... Abstract. We give a wide class of LiePoisson systems for which explicit, LiePoisson integrators, preserving all Casimirs, can be constructed. The integrators are extremely simple. Examples are the rigid body, a moment truncation, and a new, fast algorithm for the sinebracket truncation of the 2D ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
Abstract. We give a wide class of LiePoisson systems for which explicit, LiePoisson integrators, preserving all Casimirs, can be constructed. The integrators are extremely simple. Examples are the rigid body, a moment truncation, and a new, fast algorithm for the sinebracket truncation of the 2D Euler equations. Hamiltonian systems are fundamental, and symplectic integrators (SI’s) have been increasingly used to do useful extremelylongtime numerical integrations of them. Wisdom [17] has used fast SI’s to integrate the solar system far more efficiently than with standard methods; there are numerous examples illustrating the
Multidigit Multiplication For Mathematicians
, 2001
"... This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen trick, Schönhage's trick, Nussbaumer's trick, the cyclic SchönhageStrassen trick, and the CantorKaltofen theorem. It emphasizes the underlying ring homomorphisms.
Spacevariant Fourier Analysis: the Exponential Chirp Transform
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Spacevariant, or foveating, vision architectures are of importance in both machine and biological vision. In this paper we focus on a particular spacevariant map, the logpolar map, which approximates the primate visual map and which has been applied in machine vision by a number of investigators ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Spacevariant, or foveating, vision architectures are of importance in both machine and biological vision. In this paper we focus on a particular spacevariant map, the logpolar map, which approximates the primate visual map and which has been applied in machine vision by a number of investigators during the past two decades. Associated with the logpolar map, we define a new linear integral transform, which we call the exponential chirp transform. This transform provides frequency domain image processing for spacevariant image formats, while preserving the major aspects of the shiftinvariant properties of the usual Fourier transform. We then show that a logpolar coordinate transform in frequency (similar to the MellinTransform) provides a fast exponential chirp transform. This provides size and rotation, in addition to shift, invariant properties in the transformed space. Finally, we demonstrate the use of the fast exponential chirp algorithm on a database of images in a template matching task, and also demonstrate its uses for spatial filtering. Given the general lack of algorithms in spacevariant image processing, we expect that the fast exponential chirp transform will provide a fundamental tool for applications in this area.
Fast algorithms for componentbycomponent construction of rank1 lattice rules in shiftinvariant reproducing kernel Hilbert spaces
 Math. Comp
, 2004
"... Abstract. We reformulate the original componentbycomponent algorithm for rank1 lattices in a matrixvector notation so as to highlight its structural properties. For function spaces similar to a weighted Korobov space, we derive a technique which has construction cost O(sn log(n)),incontrastwitht ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Abstract. We reformulate the original componentbycomponent algorithm for rank1 lattices in a matrixvector notation so as to highlight its structural properties. For function spaces similar to a weighted Korobov space, we derive a technique which has construction cost O(sn log(n)),incontrastwiththe original algorithm which has construction cost O(sn 2). Herein s is the number of dimensions and n the number of points (taken prime). In contrast to other approaches to speed up construction, our fast algorithm computes exactly the same quantity as the original algorithm. The presented algorithm can also be used to construct randomly shifted lattice rules in weighted Sobolev spaces. 1.
An Adaptive Software Library for Fast Fourier Transforms
 In Proceedings of the International Conference on Supercomputing
, 2000
"... In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at runtime by selec ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at runtime by selecting the fastest strategy among all possible strategies, given available codelets, for a given transform size. We also presentanefficient automatic method of generating the library modules by using a specialpurpose compiler. The code generator is written in C and it generates a library of C codelets. The code generator is shown to be flexible and extensible and the entire library can be generated in a matter of seconds. Wehaveevaluated the library for performance on the IBMSP2, SGI2000, HPExemplar and Intel Pentium systems. We use the results from these evaluations to build performance models for the FFT library on different platforms. The library is shown to be portable, adaptive and efficient. 1.