Results 1  10
of
31
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 396 (6 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
FFTs for the 2SphereImprovements and Variations
 JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS
, 2003
"... Earlier work by Driscoll and Healy [18] has produced an efficient algorithm for computing the Fourier transform of bandlimited functions on the 2sphere. In this article we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and co ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
Earlier work by Driscoll and Healy [18] has produced an efficient algorithm for computing the Fourier transform of bandlimited functions on the 2sphere. In this article we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and consequent improved convolution algorithm for such functions. All require at most O(N log2 N)operations where N is the number of sample points. We also address implementation considerations and give heuristics for allowing reliable and computationally efficient floating point implementations of slightly modified algorithms. These claims are supported by extensive numerical experiments from our implementation in C on DEC, HP, SGI and Linux Pentium platforms. These results indicate that variations of the algorithm are both reliable and efficient for a large range of useful problem sizes. Performance appears to be architecturedependent. The article concludes with a brief discussion of a few potential applications.
Models of Computation  Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Fast parallel circuits for the quantum Fourier transform
 PROCEEDINGS 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS’00)
, 2000
"... We give new bounds on the circuit complexity of the quantum Fourier transform (QFT). We give an upper bound of O(log n + log log(1/ε)) on the circuit depth for computing an approximation of the QFT with respect to the modulus 2 n with error bounded by ε. Thus, even for exponentially small error, our ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
We give new bounds on the circuit complexity of the quantum Fourier transform (QFT). We give an upper bound of O(log n + log log(1/ε)) on the circuit depth for computing an approximation of the QFT with respect to the modulus 2 n with error bounded by ε. Thus, even for exponentially small error, our circuits have depth O(log n). The best previous depth bound was O(n), even for approximations with constant error. Moreover, our circuits have size O(n log(n/ε)). We also give an upper bound of O(n(log n) 2 log log n) on the circuit size of the exact QFT modulo 2 n, for which the best previous bound was O(n 2). As an application of the above depth bound, we show that Shor’s factoring algorithm may be based on quantum circuits with depth only O(log n) and polynomialsize, in combination with classical polynomialtime pre and postprocessing. In the language of computational complexity, this implies that factoring is in the complexity class ZPP BQNC, where BQNC is the class of problems computable with boundederror probability by quantum circuits with polylogarithmic depth and polynomial size. Finally, we prove an Ω(log n) lower bound on the depth complexity of approximations of the
Random Butterfly Transformations with Applications in Computational Linear Algebra
, 1995
"... Theory and practice of computational linear algebra differ over the issue of degeneracy. Block matrix decompositions are used heavily in theory, but less in practice, since even when a matrix is nondegenerate (has full rank) its block submatrices can be degenerate. The potential degeneracy of block ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
Theory and practice of computational linear algebra differ over the issue of degeneracy. Block matrix decompositions are used heavily in theory, but less in practice, since even when a matrix is nondegenerate (has full rank) its block submatrices can be degenerate. The potential degeneracy of block submatrices can completely prevent practical use of block matrix algorithms. Gaussian elimination is an important example of an algorithm affected by the possibility of degeneracy. While the basic elimination procedure is simple to state and implement, it becomes more complicated with the addition of a pivoting procedure, which handles degenerate matrices having zeros on the diagonal. Pivoting can significantly complicate the algorithm, increase data movement, and reduce speed, particularly on highperformance computers. We propose a randomization scheme that preconditions an input matrix by multiplying it with random matrices, where this multiplication can be performed efficiently. At the e...
Automatic Generation of Fast Discrete Signal Transforms
, 2001
"... This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the mat ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the matrix representations of these transforms. Unlike previous methods, the algorithm is entirely automatic and uses the defining matrix as its sole input. The sparse matrix factorization algorithm consists of two steps: First, the "symmetry" of the matrix is computed in the form of a pair of group representations; second, the representations are stepwise decomposed, giving rise to a sparse factorization of the original transform matrix. We have successfully demonstrated the method by computing automatically efficient transforms in several important cases: For the DFT, we obtain the CooleyTukey FFT; for a class of transforms including the DCT, type II, the number of arithmetic operations for our fast transforms is the same as for the bestknown algorithms. Our approach provides new insights and interpretations for the structure of these signal transforms and the question of why fast algorithms exist. The sparse matrix factorization algorithm is implemented within the software package AREP.
The CooleyTukey FFT and group theory
 Notices Amer. Math. Soc
, 2001
"... Abstract. In 1965 J. Cooley and J. Tukey published an article detailing an efficient algorithm to compute the Discrete Fourier Transform, necessary for processing the newly available reams of digital time series produced by recently invented analogtodigital converters. Since then, the Cooley– Tuke ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Abstract. In 1965 J. Cooley and J. Tukey published an article detailing an efficient algorithm to compute the Discrete Fourier Transform, necessary for processing the newly available reams of digital time series produced by recently invented analogtodigital converters. Since then, the Cooley– Tukey Fast Fourier Transform and its variants has been a staple of digital signal processing. Among the many casts of the algorithm, a natural one is as an efficient algorithm for computing the Fourier expansion of a function on a finite abelian group. In this paper we survey some of our recent work on he “separation of variables ” approach to computing a Fourier transform on an arbitrary finite group. This is a natural generalization of the Cooley–Tukey algorithm. In addition we touch on extensions of this idea to compact and noncompact groups. Pure and Applied Mathematics: Two Sides of a Coin The Bulletin of the AMS for November 1979 had a paper by L. Auslander and R. Tolimieri [3] with the delightful title “Is computing with the Finite Fourier Transform pure or applied mathematics? ” This rhetorical question was answered by showing that in fact, the finite Fourier transform, and the family of efficient algorithms used to compute it, the Fast Fourier Transform (FFT), a pillar of the world of digital signal processing, were of interest to both pure and applied mathematicians.
Decomposing Monomial Representations of Solvable Groups
, 2002
"... We present an efficient algorithm which decomposes... ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We present an efficient algorithm which decomposes...
Algebraic signal processing theory: Foundation and 1D time
 IEEE TRANS. SIGNAL PROCESS
, 2008
"... This paper introduces a general and axiomatic approach to linear signal processing (SP) that we refer to as the algebraic signal processing theory (ASP). Basic to ASP is the linear signal model defined as a triple ( 8) where familiar concepts like the filter space and the signal space are cast as an ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
This paper introduces a general and axiomatic approach to linear signal processing (SP) that we refer to as the algebraic signal processing theory (ASP). Basic to ASP is the linear signal model defined as a triple ( 8) where familiar concepts like the filter space and the signal space are cast as an algebra and a module, respectively. The mapping 8 generalizes the concept of atransform to bijective linear mappings from a vector space of signal samples into the module. Common concepts like filtering, spectrum, or Fourier transform have their equivalent counterparts in ASP. Once these concepts and their properties are defined and understood in the context of ASP, they remain true and apply to specific instantiations of the ASP signal model. For example, to develop signal processing theories for infinite and finite discrete time signals, for infinite or finite discrete space signals, or for multidimensional signals, we need only to instantiate the signal model to one that makes sense for that specific class of signals. Filtering, spectrum, Fourier transform, and other notions follow then from the corresponding ASP concepts. Similarly, common assumptions in SP translate into requirements on the ASP signal model. For example, shiftinvariance is equivalent to being commutative. For finite (duration) signals shift invariance then restricts to polynomial algebras. We explain how to design signal models from the specification of a special filter, the shift. The paper illustrates the general ASP theory with the standard time shift, presenting a unique signal model for infinite time and several signal models for finite time. The latter models illustrate the role played by boundary conditions and recover the discrete Fourier transform (DFT) and its variants as associated Fourier transforms. Finally, ASP provides a systematic methodology to derive fast algorithms for linear transforms. This topic and the application of ASP to space dependent signals and to multidimensional signals are pursued in companion papers.
SplitRadix Algorithms for Discrete Trigonometric Transforms
 Preprint, Gerhard{Mercator{ Univ. Duisburg
, 2002
"... In this paper, we derive new split{radix DCT{algorithms of radix{2 length, which are based on real factorization of the corresponding cosine matrices into products of sparse, orthogonal matrices. These algorithms use only permutations, scaling with 2, buttery operations, and plane rotations/rotat ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
In this paper, we derive new split{radix DCT{algorithms of radix{2 length, which are based on real factorization of the corresponding cosine matrices into products of sparse, orthogonal matrices. These algorithms use only permutations, scaling with 2, buttery operations, and plane rotations/rotation{reections. They can be seen by analogy with the well{known split{radix FFT. Our new algorithms have a very low arithmetical complexity which compares with the best known fast DCT{algorithms. Further, a detailed analysis of the roundo errors for the new split{radix DCT{algorithm shows its excellent numerical stability which outperforms the real fast DCT{algorithms based on polynomial arithmetic.