Results 1  10
of
49
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 393 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 151 (5 self)
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Gauss and the History of the Fast Fourier Transform,” Archive for History of Exact Sciences
, 1985
"... The fast FOURIER transform (FFT) has become well known as a very efficient algorithm for calculating the discrete FOURIER transform (DFT)a formula for evaluating the N FOURIER coefficients from a sequence of N numbers. The DFT is used in many disciplines to obtain the spectrum or frequency content ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
The fast FOURIER transform (FFT) has become well known as a very efficient algorithm for calculating the discrete FOURIER transform (DFT)a formula for evaluating the N FOURIER coefficients from a sequence of N numbers. The DFT is used in many disciplines to obtain the spectrum or frequency content of a signal
Multidigit Multiplication For Mathematicians
, 2001
"... This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
This paper surveys techniques for multiplying elements of various commutative rings. It covers Karatsuba multiplication, dual Karatsuba multiplication, Toom multiplication, dual Toom multiplication, the FFT trick, the twisted FFT trick, the splitradix FFT trick, Good's trick, the SchönhageStrassen trick, Schönhage's trick, Nussbaumer's trick, the cyclic SchönhageStrassen trick, and the CantorKaltofen theorem. It emphasizes the underlying ring homomorphisms.
A Modified SplitRadix FFT With Fewer Arithmetic Operations
, 2007
"... Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a poweroftwo discrete Fourier transform (DFT). Here, we present a simple recursive modification of the splitradix algorithm that computes th ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a poweroftwo discrete Fourier transform (DFT). Here, we present a simple recursive modification of the splitradix algorithm that computes the DFT with asymptotically about 6 % fewer operations than Yavne, matching the count achieved by Van Buskirk’s programgeneration framework. We also discuss the application of our algorithm to realdata and realsymmetric (discrete cosine) transforms, where we are again able to achieve lower arithmetic counts than previously published algorithms.
Algebraic signal processing theory: Foundation and 1D time
 IEEE TRANS. SIGNAL PROCESS
, 2008
"... This paper introduces a general and axiomatic approach to linear signal processing (SP) that we refer to as the algebraic signal processing theory (ASP). Basic to ASP is the linear signal model defined as a triple ( 8) where familiar concepts like the filter space and the signal space are cast as an ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
This paper introduces a general and axiomatic approach to linear signal processing (SP) that we refer to as the algebraic signal processing theory (ASP). Basic to ASP is the linear signal model defined as a triple ( 8) where familiar concepts like the filter space and the signal space are cast as an algebra and a module, respectively. The mapping 8 generalizes the concept of atransform to bijective linear mappings from a vector space of signal samples into the module. Common concepts like filtering, spectrum, or Fourier transform have their equivalent counterparts in ASP. Once these concepts and their properties are defined and understood in the context of ASP, they remain true and apply to specific instantiations of the ASP signal model. For example, to develop signal processing theories for infinite and finite discrete time signals, for infinite or finite discrete space signals, or for multidimensional signals, we need only to instantiate the signal model to one that makes sense for that specific class of signals. Filtering, spectrum, Fourier transform, and other notions follow then from the corresponding ASP concepts. Similarly, common assumptions in SP translate into requirements on the ASP signal model. For example, shiftinvariance is equivalent to being commutative. For finite (duration) signals shift invariance then restricts to polynomial algebras. We explain how to design signal models from the specification of a special filter, the shift. The paper illustrates the general ASP theory with the standard time shift, presenting a unique signal model for infinite time and several signal models for finite time. The latter models illustrate the role played by boundary conditions and recover the discrete Fourier transform (DFT) and its variants as associated Fourier transforms. Finally, ASP provides a systematic methodology to derive fast algorithms for linear transforms. This topic and the application of ASP to space dependent signals and to multidimensional signals are pursued in companion papers.
Fast Fourier Analysis for Abelian Group Extensions
, 1995
"... Let G be a finite group and f any complexvalued function defined on G and ae an irreducible complex matrix representation of G. The Fourier transform of f at ae is defined to be the matrix P s2G f(s)ae(s). The Fourier transforms of f at all the irreducible representations of G determine f via th ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Let G be a finite group and f any complexvalued function defined on G and ae an irreducible complex matrix representation of G. The Fourier transform of f at ae is defined to be the matrix P s2G f(s)ae(s). The Fourier transforms of f at all the irreducible representations of G determine f via the Fourier inversion formula f(s) = 1 jGj P ae d ae trace( b f(ae)ae(s \Gamma1 )): Direct computation of all Fourier transforms of f involves on the order of j G j 2 operations as does direct computation of Fourier inversion. Here fast algorithms are obtained for both operations in the case in which G contains some nontrivial normal subgroup K such that G=K is abelian. Consequently, fast algorithms for computing convolutions on G in this situation are also determined. Under the simplifiying assumption of exponent for matrix multiplication equal to 2 (it is 2.38 as of this writing) it is shown that the number of operations needed to compute all Fourier transforms on G is O( jGj jKj ...
A fast algorithm for DCTdomain inverse motion compensation based on shared information in a macroblock
 IEEE Trans. on Circuits and Systems for Video Tech
, 2000
"... Abstract—The ability to construct intracoded frame from motioncompensated intercoded frames directly in the compressed domain is important for efficient video manipulation and composition. In the context of motioncompensated discrete cosine transform (DCT)based coding of video as in MPEG video, t ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract—The ability to construct intracoded frame from motioncompensated intercoded frames directly in the compressed domain is important for efficient video manipulation and composition. In the context of motioncompensated discrete cosine transform (DCT)based coding of video as in MPEG video, this problem of DCTdomain inverse motion compensation has been studied and, subsequently, improved faster algorithms were proposed. These schemes, however, treat each 8 8 block as a fundamental unit, and do not take into account the fact that in MPEG, a macroblock consists of several such blocks. In this paper, we show how shared information within a macroblock, such as a motion vector and common blocks, can be exploited to yield substantial speedup in computation. Compared to previous bruteforce approaches, our algorithms yield about 44% improvement. Our technique is independent of the underlying computational or processor model, and thus can be implemented on top of any optimized solution. We demonstrate an improvement by about 19%, and 13.5 % in the worst case, on top of the optimized solutions presented in existing literature. Index Terms—Compressed domain processing, DCTdomain inverse motion compensation, MPEG video, video composition, video processing. I.
Performance models and search methods for optimal FFT implementations
, 2000
"... This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (de ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This work was supported by DARPA through the ARO grant # DABT639810004 This thesis considers systematic methodologies for finding optimized implementations for the fast Fourier transform (FFT). By employing rewrite rules (e.g., the CooleyTukey formula), we obtain a divide and conquer procedure (decomposition) that breaks down the initial transform into combinations of different smaller size subtransforms, which are graphically represented as breakdown trees. Recursive application of the rewrite rules generates a set of algorithms and alternative codes for the FFT computation. The set of "all " possible implementations (within the given set of the rules) results in pairing the possible breakdown trees with the code implementation alternatives. To evaluate the quality of these implementations, we develop analytical and experimental performance models. Based on these models, we derive methods dynamic programming, soft decision dynamic programming and exhaustive search to find the implementation with minimal runtime. Our test results demonstrate that good algorithms and codes, accurate performance
Hardness Results and Spectral Techniques for Combinatorial Problems on Circulant Graphs
 Linear Algebra Appl
, 1998
"... We show that computing (and even approximating) MAXIMUM CLIQUE and MINIMUM GRAPH COLORING for circulant graphs is essentially as hard as in the general case. In contrast, we show that, under additional constraints, e.g., prime order and/or sparseness, GRAPH ISOMORPHISM and MINIMUM GRAPH COLORING be ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We show that computing (and even approximating) MAXIMUM CLIQUE and MINIMUM GRAPH COLORING for circulant graphs is essentially as hard as in the general case. In contrast, we show that, under additional constraints, e.g., prime order and/or sparseness, GRAPH ISOMORPHISM and MINIMUM GRAPH COLORING become easier in the circulant case, and we take advantage of spectral techniques for their efficient computation.