Results 1  10
of
38
The design and implementation of FFTW3
 Proceedings of the IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 396 (6 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. Keywords—Adaptive software, cosine transform, fast Fourier transform (FFT), Fourier transform, Hartley transform, I/O tensor.
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 155 (6 self)
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Superfast solution of real positive definite Toeplitz systems
 SIAM J. Matrix Anal. Appl
, 1988
"... Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the nth Szegő polynomial using fewer than 8n log2 2 n real arithmetic operations without explicit use of the bitreversal permutation. Since Levinson’s algorithm requires slightly more than 2n2 operations to obtain this polynomial, we achieve crossover with Levinson’s algorithm at n = 256. Key words. Toeplitz matrix, Schur’s algorithm, splitradix Fast Fourier Transform
PocketSphinx: A free, realtime continuous speech recognition system for handheld devices
 in Proceedings of ICASSP
, 2006
"... The availability of realtime continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in humancomputer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused o ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
The availability of realtime continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in humancomputer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU SPHINXII, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to handheld devices. The resulting system operates in an average 0.87 times realtime on a 206MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first handheld LVCSR system available under an opensource license. 1.
A Modified SplitRadix FFT With Fewer Arithmetic Operations
, 2007
"... Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a poweroftwo discrete Fourier transform (DFT). Here, we present a simple recursive modification of the splitradix algorithm that computes th ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Recent Results by Van Buskirk et al. have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a poweroftwo discrete Fourier transform (DFT). Here, we present a simple recursive modification of the splitradix algorithm that computes the DFT with asymptotically about 6 % fewer operations than Yavne, matching the count achieved by Van Buskirk’s programgeneration framework. We also discuss the application of our algorithm to realdata and realsymmetric (discrete cosine) transforms, where we are again able to achieve lower arithmetic counts than previously published algorithms.
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
right notice and this permission notice are preserved on all copies.
A realtime blind source separation scheme and its application to reverberant and noisy acoustic environments
, 2006
"... ..."
Analytic Confidence Level Calculations Using the Likelihood Ratio and Fourier Transform CERN
, 2000
"... The interpretation of new particle search results involves a confidence level calculation on either the discovery hypothesis or the backgroundonly (“null”) hypothesis. A typical approach uses toy Monte Carlo experiments to build an expected experiment estimator distribution against which an observe ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The interpretation of new particle search results involves a confidence level calculation on either the discovery hypothesis or the backgroundonly (“null”) hypothesis. A typical approach uses toy Monte Carlo experiments to build an expected experiment estimator distribution against which an observed experiment’s estimator may be compared. In this note, a new approach is presented which calculates analytically the experiment estimator distribution via a Fourier transform, using the likelihood ratio as an ordering estimator. The analytic approach enjoys an enormous speed advantage over the toy Monte Carlo method, making it possible to quickly and precisely calculate confidence level results. 1
Active contour external force using vector field convolution for image segmentation
 IEEE Transactions on Image Processing
"... Abstract—Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract—Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector field convolution (VFC), to address these problems. VFC is calculated by convolving the edge map generated from the image with the userdefined vector field kernel. We propose two structures for the magnitude function of the vector field kernel, and we provide an analytical method to estimate the parameter of the magnitude function. Mixed VFC is introduced to alleviate the possible leakage problem caused by choosing inappropriate parameters. We also demonstrate that the standard external force and the gradient vector flow (GVF) external force are special cases of VFC in certain scenarios. Examples and comparisons with GVF are presented in this paper to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field. Index Terms—Active contours, deformable models, external force, gradient vector flow (GVF), snakes, vector field convolution (VFC). I.
Calculating the FHT in Hardware
, 1992
"... We have developed a parallel, pipelined architecture for calculating the Fast Hartley Transform. Hardware implementation of the FHT introduces two challenges: retrograde indexing and data scaling. We propose a novel addressing scheme that permits the fast computation of FHT butterflies, and describe ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We have developed a parallel, pipelined architecture for calculating the Fast Hartley Transform. Hardware implementation of the FHT introduces two challenges: retrograde indexing and data scaling. We propose a novel addressing scheme that permits the fast computation of FHT butterflies, and describe a hardware implementation of conditional block floating point scaling that reduces error due to data growth with little extra cost. Simulations reveal a processor capable of transforming a 1Kpoint sequence in 170 microseconds using a 15.4 MHz clock.