Results 1  10
of
106
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 401 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
Numerical Recipes in C: The Art of Scientific Computing. Second Edition
, 1992
"... This reprinting is corrected to software version 2.10 ..."
Abstract

Cited by 105 (0 self)
 Add to MetaCart
This reprinting is corrected to software version 2.10
The Fractional Fourier Transform and Applications
, 1995
"... This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transf ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
This paper describes the "fractional Fourier transform", which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e \Gamma2ßi=n , the fractional Fourier transform is based on fractional roots of unity e \Gamma2ßiff , where ff is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with noninteger periodicities, performing highresolution trigonometric interpolation, detecting lines in noisy images and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques. Bailey is with the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center, Moffett Field,...
Fast Discrete Polynomial Transforms with Applications to Data Analysis for Distance Transitive Graphs
, 1997
"... . Let P = fP 0 ; : : : ; Pn\Gamma1 g denote a set of polynomials with complex coefficients. Let Z = fz 0 ; : : : ; z n\Gamma1 g ae C denote any set of sample points. For any f = (f 0 ; : : : ; fn\Gamma1 ) 2 C n the discrete polynomial transform of f (with respect to P and Z) is defined as the col ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
. Let P = fP 0 ; : : : ; Pn\Gamma1 g denote a set of polynomials with complex coefficients. Let Z = fz 0 ; : : : ; z n\Gamma1 g ae C denote any set of sample points. For any f = (f 0 ; : : : ; fn\Gamma1 ) 2 C n the discrete polynomial transform of f (with respect to P and Z) is defined as the collection of sums, f b f(P 0 ); : : : ; b f(Pn\Gamma1 )g, where f(P j ) = hf; P j i = P n\Gamma1 i=0 f i P j (z i )w(i) for some associated weight function w. These sorts of transforms find important applications in areas such as medical imaging and signal processing. In this paper we present fast algorithms for computing discrete orthogonal polynomial transforms. For a system of N orthogonal polynomials of degree at most N \Gamma 1 we give an O(N log 2 N) algorithm for computing a discrete polynomial transform at an arbitrary set of points instead of the N 2 operations required by direct evaluation. Our algorithm depends only on the fact that orthogonal polynomial sets satisfy a thre...
Stereo Inverse Perspective Mapping: Theory and Applications
 Image and Vision Computing Journal
, 1998
"... This paper discusses an extension to the Inverse Perspective Mapping geometrical transform to the processing of stereo images and presents the calibration method used on the ARGO autonomous vehicle. The article features also an example of application in the automotive field, in which the stereo Inve ..."
Abstract

Cited by 31 (17 self)
 Add to MetaCart
This paper discusses an extension to the Inverse Perspective Mapping geometrical transform to the processing of stereo images and presents the calibration method used on the ARGO autonomous vehicle. The article features also an example of application in the automotive field, in which the stereo Inverse Perspective Mapping helps to speed up the process. 1 Introduction The processing of images is generally performed at different levels, the lowest of which is characterized by the preservation of the data structure after the processing. Different techniques have been introduced for lowlevel image processing and can be classified in three main categories: Pointwise operations, Cellular Automaton operations, and Global operations [1]. In particular Global operations are transforms between different domains; their application simplifies the detection of image features which, conversely, would require a more complex computation in the original domain. They are not based on a onetoone map...
Analyzing Stochastic FixedPriority RealTime Systems
 In Proceedings of the Fifth International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Joint European Conferences on Theory and Practice of Software
, 1999
"... . Traditionally, realtime systems require that the deadlines of all jobs be met. For many applications, however, this is an overly stringent requirement. An occasional missed deadline may cause decreased performance but is nevertheless acceptable. We present an analysis technique by which a lower b ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
. Traditionally, realtime systems require that the deadlines of all jobs be met. For many applications, however, this is an overly stringent requirement. An occasional missed deadline may cause decreased performance but is nevertheless acceptable. We present an analysis technique by which a lower bound on the percentage of deadlines that a periodic task meets is determined and compare the lower bound with simulation results for an example system. We have implemented the technique in the PERTS realtime system prototyping environment [6, 7]. 1 Introduction A distinguishing characteristic of realtime computer systems is the requirement that the system meet its temporal constraints. While there are many different types of constraints, the most common form is expressed in terms of deadlines: a job completes its execution by its deadline. In a hard realtime system, all jobs must meet their deadlines and a missed deadline is treated as a fatal fault. Hence hard realtime systems are desi...
Performing outofcore FFTs on parallel disk systems
 PARALLEL COMPUTING
, 1998
"... The Fast Fourier Transform (FFT) plays a key role in many areas of computational science and engineering. Although most onedimensional FFT problems can be solved entirely in main memory, some important classes of applications require outofcore techniques. For these, use of parallel I/O systems ca ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
The Fast Fourier Transform (FFT) plays a key role in many areas of computational science and engineering. Although most onedimensional FFT problems can be solved entirely in main memory, some important classes of applications require outofcore techniques. For these, use of parallel I/O systems can improve performance considerably. This paper shows how to perform onedimensional FFTs using a parallel disk system with independent disk accesses. We present both analytical and experimental results for performing outofcore FFTs in two ways: using traditional virtual memory with demand paging, and using a provably asymptotically optimal algorithm for the Parallel Disk Model (PDM) of Vitter and Shriver. When run on a DEC 2100 server with a large memory and eight parallel disks, the optimal algorithm for the PDM runs up to 144.7 times faster than incore methods under demand paging. Moreover, even including I/O costs, the normalized times for the optimal PDM algorithm are competitive, or better than, those for incore methods even when they run entirely in memory.
Lossless Acceleration Of Fractal Image Compression By Fast Convolution
 Proc. IEEE Int. Conf. on Image Processing
, 1996
"... In fractal image compression the encoding step is computationally expensive. We present a new technique for reducing the computational complexity. It is lossless, i.e., it does not sacrifice any image quality for the sake of the speedup. It is based on a codebook coherence characteristic to fractal ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
In fractal image compression the encoding step is computationally expensive. We present a new technique for reducing the computational complexity. It is lossless, i.e., it does not sacrifice any image quality for the sake of the speedup. It is based on a codebook coherence characteristic to fractal image compression and leads to a novel application of the fast Fourier transformbased convolution. The method provides a new conceptual view of fractal image compression. This paper focuses on the implementation issues and presents the first empirical experiments analyzing the performance benefits of the convolution approach to fractal image compression depending on image size, range size, and codebook size. The results show acceleration factors for large ranges up to 23 (larger factors possible), outperforming all other currently known lossless acceleration methods for such range sizes. 1. INTRODUCTION In fractal image compression [1, 2] image blocks (ranges) have to be compared against a...
Automatic Generation of Fast Discrete Signal Transforms
, 2001
"... This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the mat ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the matrix representations of these transforms. Unlike previous methods, the algorithm is entirely automatic and uses the defining matrix as its sole input. The sparse matrix factorization algorithm consists of two steps: First, the "symmetry" of the matrix is computed in the form of a pair of group representations; second, the representations are stepwise decomposed, giving rise to a sparse factorization of the original transform matrix. We have successfully demonstrated the method by computing automatically efficient transforms in several important cases: For the DFT, we obtain the CooleyTukey FFT; for a class of transforms including the DCT, type II, the number of arithmetic operations for our fast transforms is the same as for the bestknown algorithms. Our approach provides new insights and interpretations for the structure of these signal transforms and the question of why fast algorithms exist. The sparse matrix factorization algorithm is implemented within the software package AREP.