## A Fast Fourier Transform Compiler (1999)

### Cached

### Download Links

- [www.cs.cornell.edu]
- [supertech.csail.mit.edu]
- [supertech.csail.mit.edu]
- [ftp.fftw.org]
- DBLP

### Other Repositories/Bibliography

Citations: | 177 - 5 self |

### BibTeX

@MISC{Frigo99afast,

author = {Matteo Frigo},

title = {A Fast Fourier Transform Compiler},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.

### Citations

2430 | The Art of Computer Programming - Knuth - 1998 |

1062 | Advanced Compiler Design and Implementation - Muchnick - 1997 |

751 | The Art of Computer Programming, Volume 2: Seminumerical Algorithms - Knuth - 1998 |

577 |
The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...1 from [HK81] states that the execution of the FFT graph of size n = 2 k on a machine with C registers (where Csn) requires at least\Omega\Gamma n log n= log C) register spills. 9 Aggarwal and Vitter =-=[AV88]-=- generalize this result to disk I/O, where a single I/O operation can transfer a block of elements. In addition, Aggarwal and Vitter give a schedule that matches the lower bound. Their schedule is con... |

513 | FFTW: an adaptive software architecture for FFT - Frigo, Johnson - 1998 |

384 | The implementation of the cilk-5 multithreaded language
- Frigo, Leiserson, et al.
- 1998
(Show Context)
Citation Context ...ivious algorithm. Consider the Cooley-Tukey algorithm applied to a trans10 We say "cache-" and not "register-oblivious" since this notion first arose from the analysis of the cachi=-=ng behavior of Cilk [FLR98]-=- programs using shared memory. Work is still in progress to understand and define cache-obliviousness formally, and this concept does not yet appear in the literature. Simple divide-and-conquer cache-... |

252 | Algorithms for parallel memory. i: Two-level memories. Algorithmica - Vitter, Shriver - 1994 |

184 |
I/O complexity: The red-blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...e asymptotic number of register spills, no matter how many registers the target machine has. This truly remarkable fact can be derived from the analysis of the red-blue pebbling game of Hong and Kung =-=[HK81]-=-, as we shall see in Section 6. For transforms of other sizes the scheduling strategy is no longer provably good, but it still works well in practice. Again, the scheduler depends heavily on the topol... |

152 |
An algorithm for the machine computation of complex Fourier series
- Cooley, Tukey
- 1965
(Show Context)
Citation Context ...\Gamma1 X j2 =0 2 4 0 @ n1 \Gamma1 X j1 =0 X [j 1 n 2 + j 2 ]! \Gammai 1 j1 n1 1 A ! \Gammai 1 j2 n 3 5 ! \Gammai 2 j2 n2 : This formula yields the Cooley-Tukey fast Fourier transformsalgorithm (FFT) =-=[CT65]-=-. The algorithm computes n 2 3 transforms of size n 1 (the inner sum), it multiplies the result by the so-called twiddle factors ! \Gammai 1 j2 n , and finally it computessn 1 transforms of size n 2 (... |

122 |
Using C++ template metaprograms
- Veldhuizen
- 1996
(Show Context)
Citation Context ...rk for Haskell [Par92]. Unlike my program, this genfft is limited to transforms of size 2 k . The program in nofib is not documented at all, but apparently it can be traced back to [HV92]. Veldhuizen =-=[Vel95]-=- used a template metaprograms technique to generate C++ programs. The technique exploits the template facility of C++ to force the C++ compiler to perform computations at compile time. All these syste... |

108 | An analysis of dag-consistent distributed shared-memory algorithms - Blumofe, Frigo, et al. - 1996 |

104 | How to Declare an Imperative
- Wadler
- 1995
(Show Context)
Citation Context ...s paper, we consider only the generation of transforms of a fixed size. besides noticing common subexpressions, the simplifier also attempts to create them. The simplifier is written in monadic style =-=[Wad97], which al-=-lowed me to deal with the dag as if it were a tree, making the implementation much easier. 3. In the scheduler, genfft produces a topological sort of the dag (a "schedule") that, for transfo... |

91 |
Fast fourier transforms: A tutorial review and a state of the art
- Duhamel, Vetterly
- 1990
(Show Context)
Citation Context ...8, FJ], a comprehensive collection of fast C routines for computing the discrete Fourier transform (DFT) in one or more dimensions, of both real and complex data, and of arbitrary input size. The DFT =-=[DV90]-=- is one of the most important computational problems, and many real-world applications require that the transform be computed as quickly as possible. FFTW is one of the fastest DFT programs available ... |

91 |
Algorithms for the Discrete Fourier Transform and Convolution
- Tolimieri, An, et al.
- 1989
(Show Context)
Citation Context ...output of genfft is at times completely unexpected. For example, for a complex transform of size n = 13, the generator employs an algorithm due to Rader, in the form presented by Tolimieri and others =-=[TAL97]-=-. In its most sophisticated variant, this algorithm performs 214 real (floating-point) additions and 76 real multiplications. (See [TAL97, Page 161].) The generated code in FFTW for the same algorithm... |

88 |
On computing the discrete Fourier transform
- Winograd
- 1976
(Show Context)
Citation Context ...erts the transform into a circular convolution of size n \Gamma 1. The circular convolution can be computed recursively using two Fourier transforms, or by means of a clever technique due to Winograd =-=[Win78]-=- (genfft does not employ this technique yet, however). Other algorithms are known for prime sizes, and this is still the subject of active research. See [TAL97] for a recent compendiumon the topic. An... |

80 | The nofib benchmark suite of Haskell programs
- Partain
- 1993
(Show Context)
Citation Context ...lgorithms that includes the FFT and matrix multiplication (including Strassen's algorithm). Another program called genfft generating Haskell FFT subroutines is part of the nofib benchmark for Haskell =-=[Par92]-=-. Unlike my program, this genfft is limited to transforms of size 2 k . The program in nofib is not documented at all, but apparently it can be traced back to [HV92]. Veldhuizen [Vel95] used a templat... |

73 |
Discrete Fourier transforms when the number of data samples is prime
- Rader
- 1968
(Show Context)
Citation Context ...e 619].) If n is a multiple of 4, the split-radix algorithm [DV90] can save some operations with respect to Cooley-Tukey. If n is prime, genfft uses either Equation (1) directly, or Rader's algorithm =-=[Rad68]-=-, which converts the transform into a circular convolution of size n \Gamma 1. The circular convolution can be computed recursively using two Fourier transforms, or by means of a clever technique due ... |

70 | The Fastest Fourier Transform in the West
- Frigo, Johnson
- 1997
(Show Context)
Citation Context ... 2, 3, 5, and 7. This distinction is important because the DFT algorithm depends on the factorization of the size, and most implementations of the DFT are optimized for the case of powers of two. See =-=[FJ97]-=- for additional experimental results. FFTW was compiled with Sun's C compiler (WorkShop Compilers 4.2 30 Oct 1996 C 4.2). (which amounts to 95% of the total code) was generated automatically by a spec... |

69 | Algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica - Vitter, Shriver - 1994 |

62 |
Real-valued fast fourier transform algorithms
- Sorensen, al
- 1987
(Show Context)
Citation Context ..., which occurs frequently in applications. This specialization is nontrivial, and in the past the design of an efficient real DFT algorithm required a serious effort that was well worth a publication =-=[SJHB87]-=-. genfft, however, automatically derives real DFT programs from the complex algorithms, and the resulting programs have the same arithmetic complexity as those discussed by [SJHB87, Table II]. 3 The g... |

25 |
Analysis of linear digital networks
- Crochiere, Oppenheim
- 1975
(Show Context)
Citation Context ...ll floating point constants are made positive, the generated code runs faster. (See Section 5.) Another important transformation is dag transposition, which derives from the theory of linear networks =-=[CO75]-=-. Moreover, 1 In the actual FFTW system, some codelets perform more tasks, however. For the purposes of this paper, we consider only the generation of transforms of a fixed size. besides noticing comm... |

25 |
The design of optimal DFT algorithms using dynamic programming
- JOHNSON, BURRUS
- 1983
(Show Context)
Citation Context ... prime factor FFT algorithm. This program is limited to complex transforms of size n, where n must be factorable into mutually prime factors in the set f2; 3; 4; 5; 7; 8; 9; 16g. Johnson 5 and Burrus =-=[JB83]-=- applied dynamic programming to the automatic design of DFT modules. Selesnick and Burrus [SB96] used a program to generate MATLAB subroutines for DFTs of certain prime sizes. In many cases, these sub... |

15 | A framework for generating distributed-memory parallel programs for block recursive algorithms
- Gupta, Huang, et al.
- 1996
(Show Context)
Citation Context ... a program to generate MATLAB subroutines for DFTs of certain prime sizes. In many cases, these subroutines are the best known in terms of arithmetic complexity. The EXTENT system by Gupta and others =-=[GHSJ96]-=- generates FORTRAN code in response to an input expressed in a tensor product language. Using the tensor product abstraction one can express concisely a variety of algorithms that includes the FFT and... |

13 | Automatic generation of prime length FFT programs
- SELESNICK, BURRUS
- 1996
(Show Context)
Citation Context ...erated by the algorithm are statements of the form a = c + d, b = c \Gamma d, e = a + b. The generator simplifies these statements to e = 2sc, provided a and b are not needed elsewhere. Incidentally, =-=[SB96]-=- reports an algorithm with 188 additions and 40 multiplications, using a more involved DFT algorithm that I have not implemented yet. To my knowledge, the program generated by genfft performs the lowe... |

8 |
Factorization method for crystallographic Fourier transforms
- AN, COOLEY, et al.
- 1990
(Show Context)
Citation Context ... used in JPEG). I am confident that the techniques described in this paper will prove valuable in this sort of application. Recently, I modified genfft to generate crystallographic Fourier transforms =-=[ACT90]-=-. In this particular application, the input consists of 2D or 3D data with certain symmetries. For example, the input data set might be invariant with respect to rotations of 60 degrees, and it is des... |

7 |
A prime factor FFT algorithm implementation using a program generation technique
- PEREZ, TAKAOKA
- 1987
(Show Context)
Citation Context ..., the first generator of FFT programs was FOURGEN, written by J. A. Maruhn [Mar76]. It was written in PL/I and it generated FORTRAN. 4 FOURGEN is limited to transforms of size 2 k . Perez and Takaoka =-=[PT87]-=- present a generator of Pascal programs implementing a prime factor FFT algorithm. This program is limited to complex transforms of size n, where n must be factorable into mutually prime factors in th... |

3 | The FFTW web - Frigo, Johnson |

3 |
Arrays in a lazy functional language—a case study: the fast Fourier transform
- HARTEL, VREE
- 1992
(Show Context)
Citation Context ...f the nofib benchmark for Haskell [Par92]. Unlike my program, this genfft is limited to transforms of size 2 k . The program in nofib is not documented at all, but apparently it can be traced back to =-=[HV92]-=-. Veldhuizen [Vel95] used a template metaprograms technique to generate C++ programs. The technique exploits the template facility of C++ to force the C++ compiler to perform computations at compile t... |

3 |
Implementing compiler optimizations using parallel graph reduction
- Kulik
- 1995
(Show Context)
Citation Context ...ase where either a' or b' is 0 or 1, and so on. The code for stimesM is shown in Figure 10. The neat trick of using memoization for graph traversal was invented by Joanna Kulik in her master's thesis =-=[Kul95]-=-, as far as I can tell. Common-subexpression elimination (CSE) is performed behind the scenes by the monadic operator returnM. The CSE algorithm is essentially the classical bottom-up let rec algsimpM... |

3 |
The Objective Caml system release 2.00, Institut National de Recherche en Informatique at Automatique (INRIA
- LEROY
- 1998
(Show Context)
Citation Context ...compiled with Sun's C compiler (WorkShop Compilers 4.2 30 Oct 1996 C 4.2). (which amounts to 95% of the total code) was generated automatically by a special-purpose compiler written in Objective Caml =-=[Ler98]-=-. This paper explains how this compiler works. FFTW does not implement a single DFT algorithm, but it is structured as a library of codelets---sequences of C code that can be composed in many ways. In... |

3 |
FOURGEN: a fast Fourier transform program generator
- MARUHN
- 1976
(Show Context)
Citation Context ... at least twenty years, possibly to avoid the tedium of getting all the implementation details right by hand. To my knowledge, the first generator of FFT programs was FOURGEN, written by J. A. Maruhn =-=[Mar76]-=-. It was written in PL/I and it generated FORTRAN. 4 FOURGEN is limited to transforms of size 2 k . Perez and Takaoka [PT87] present a generator of Pascal programs implementing a prime factor FFT algo... |

3 |
Thenofibbenchmark suite of Haskell programs
- Partain
- 1992
(Show Context)
Citation Context ...of algorithms that includes the FFT and matrix multiplication (including Strassen’s algorithm). Another program calledgenfftgenerating Haskell FFT subroutines is part of thenofibbenchmark for Haskell =-=[Par92]-=-. Unlike my program, thisgenfftis limited to transforms of size2k. The program innofibis not documented at all, but apparently it can be traced back to [HV92]. Veldhuizen [Vel95] used a template metap... |

1 | The FFTW web page - Frigo, Johnson |

1 | volume 2 (Seminumerical Algorithms). AddisonWesley, 3rd edition - ming - 1998 |

1 | The FFTW web page.http://theory.lcs.mit.edu/~fftw - Frigo, Johnson |