Results 1  10
of
98
FFTW: An Adaptive Software Architecture For The FFT
, 1998
"... FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have ..."
Abstract

Cited by 444 (4 self)
 Add to MetaCart
FFT literature has been mostly concerned with minimizing the number of floatingpoint operations performed by an algorithm. Unfortunately, on presentday microprocessors this measure is far less important than it used to be, and interactions with the processor pipeline and the memory hierarchy have a larger impact on performance. Consequently, one must know the details of a computer architecture in order to design a fast algorithm. In this paper, we propose an adaptive FFT program that tunes the computation automatically for any particular hardware. We compared our program, called FFTW, with over 40 implementations of the FFT on 7 machines. Our tests show that FFTW's selfoptimizing approach usually yields significantly better performance than all other publicly available software. FFTW also compares favorably with machinespecific, vendoroptimized libraries. 1. INTRODUCTION The discrete Fourier transform (DFT) is an important tool in many branches of science and engineering [1] and...
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 396 (6 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
The Landscape of Parallel Computing Research: A View from Berkeley
 TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 155 (6 self)
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
The FourierSeries Method For Inverting Transforms Of Probability Distributions
, 1991
"... This paper reviews the Fourierseries method for calculating cumulative distribution functions (cdf's) and probability mass functions (pmf's) by numerically inverting characteristic functions, Laplace transforms and generating functions. Some variants of the Fourierseries method are remarkably easy ..."
Abstract

Cited by 149 (51 self)
 Add to MetaCart
This paper reviews the Fourierseries method for calculating cumulative distribution functions (cdf's) and probability mass functions (pmf's) by numerically inverting characteristic functions, Laplace transforms and generating functions. Some variants of the Fourierseries method are remarkably easy to use, requiring programs of less than fifty lines. The Fourierseries method can be interpreted as numerically integrating a standard inversion integral by means of the trapezoidal rule. The same formula is obtained by using the Fourier series of an associated periodic function constructed by aliasing; this explains the name of the method. This Fourier analysis applies to the inversion problem because the Fourier coefficients are just values of the transform. The mathematical centerpiece of the Fourierseries method is the Poisson summation formula, which identifies the discretization error associated with the trapezoidal rule and thus helps bound it. The greatest difficulty is approximately calculating the infinite series obtained from the inversion integral. Within this framework, lattice cdf's can be calculated from generating functions by finite sums without truncation. For other cdf's, an appropriate truncation of the infinite series can be determined from the transform based on estimates or bounds. For Laplace transforms, the numerical integration can be made to produce a nearly alternating series, so that the convergence can be accelerated by techniques such as Euler summation. Alternatively, the cdf can be perturbed slightly by convolution smoothing or windowing to produce a truncation error bound independent of the original cdf. Although error bounds can be determined, an effective approach is to use two different methods without elaborate error analysis. For this...
CacheOblivious Algorithms
, 1999
"... This thesis presents "cacheoblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cac ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
This thesis presents "cacheoblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cacheline length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobistyle multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cacheline length L, where Z =# (L 2 ), the number of cache misses for an m × n matrix transpose is #(1 + mn=L). The number of cache misses for either an npoint FFT or the sorting of n numbers is #(1 + (n=L)(1 + log Z n)). The cache complexity of computing n ...
A PrecorrectedFFT Method for Electrostatic Analysis of Complicated 3D Structures
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 1997
"... In this paper we present a new algorithm for accelerating the potential calculation which occurs in the inner loop of iterative algorithms for solving electromagnetic boundary integral equations. Such integral equations arise, for example, in the extraction of coupling capacitances in threedimensio ..."
Abstract

Cited by 68 (26 self)
 Add to MetaCart
In this paper we present a new algorithm for accelerating the potential calculation which occurs in the inner loop of iterative algorithms for solving electromagnetic boundary integral equations. Such integral equations arise, for example, in the extraction of coupling capacitances in threedimensional (3D) geometries. We present extensive experimental comparisons with the capacitance extraction code FASTCAP [1] and demonstrate that, for a wide variety of geometries commonly encountered in integrated circuit packaging, onchip interconnect and microelectromechanical systems, the new "precorrectedFFT " algorithm is superior to the fast multipole algorithm used in FASTCAP in terms of execution time and memory use. At engineering accuracies, in terms of a speedmemory product, the new algorithm can be superior to the fast multipole based schemes by more than an order of magnitude.
The Fastest Fourier Transform in the West
 the Proceedings of the 1998 International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98
, 1997
"... This paper describes FFTW, a portable C package for computing the one and multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFT software, including the wellknown FFTPACK and the code from Numerical Recipes. More interestingly, FFT ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
This paper describes FFTW, a portable C package for computing the one and multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFT software, including the wellknown FFTPACK and the code from Numerical Recipes. More interestingly, FFTW is competitive with or better than proprietary, highlytuned codes such as Sun's Performance Library and IBM's ESSL library. FFTW implements the CooleyTukey fast Fourier transform, and is freely available on the Web at http://theory.lcs.mit.edu/fftw. Three main ideas are the keys to FFTW's performance. First, the computation of the transform is performed by an executor consisting of highlyoptimized, composable blocks of C code called codelets. Second, at runtime, a planner finds an efficient way (called a `plan') to compose the codelets. Through the planner, FFTW adapts itself to the architecture of the machine it is running on. Third, the codelets are automatically generated by a code...
The Fractional Fourier Transform and Applications
, 1995
"... This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transf ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transform is based on fractional roots of unity e \Gamma2ßiff , where ff is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with noninteger periodicities, performing highresolution trigonometric interpolation, detecting lines in noisy images and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques. Bailey is with the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center, Moffett Field,...
Workload analysis and demand prediction of enterprise data center applications
, 2007
"... Abstract — Advances in virtualization technology are enabling the creation of resource pools of servers that permit multiple application workloads to share each server in the pool. Understanding the nature of enterprise workloads is crucial to properly designing and provisioning current and future s ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Abstract — Advances in virtualization technology are enabling the creation of resource pools of servers that permit multiple application workloads to share each server in the pool. Understanding the nature of enterprise workloads is crucial to properly designing and provisioning current and future services in such pools. This paper considers issues of workload analysis, performance modeling, and capacity planning. Our goal is to automate the efficient use of resource pools when hosting large numbers of enterprise services. We use a trace based approach for capacity management that relies on i) the characterization of workload demand patterns, ii) the generation of synthetic workloads that predict future demands based on the patterns, and iii) a workload placement recommendation service. The accuracy of capacity planning predictions depends on our ability to characterize workload demand patterns, to recognize trends for expected changes in future demands, and to reflect business forecasts for otherwise unexpected changes in future demands. A workload analysis demonstrates the burstiness and repetitive nature of enterprise workloads. Workloads are automatically classified according to their periodic behavior. The similarity among repeated occurrences of patterns is evaluated. Synthetic workloads are generated from the patterns in a manner that maintains the periodic nature, burstiness, and trending behavior of the workloads. A case study involving six months of data for 139 enterprise applications is used to apply and evaluate the enterprise workload analysis and related capacity planning methods. The results show that when consolidating to 8 processor systems, we predicted future perserver required capacity to within one processor 95 % of the time. The accuracy of predictions for required capacity suggests that such resource savings can be achieved with little risk. I.