Results 1  10
of
66
The design and implementation of FFTW3
 PROCEEDINGS OF THE IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 678 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for realdata DFTs of prime size, a new way of implementing DFTs by means of machinespecific singleinstruction, multipledata (SIMD) instructions, and how a specialpurpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm.
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 186 (5 self)
 Add to MetaCart
(Show Context)
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
FAST FOURIER TRANSFORMS: A TUTORIAL REVIEW AND A STATE OF THE ART
, 1990
"... The publication of the CooleyTukey fast Fourier transform (FIT) algorithm in 1965 has opened a new area in digital signal processing by reducing the order of complexity of some crucial computational tasks like Fourier transform and convolution from N 2 to N log2 N, where N is the problem size. The ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
The publication of the CooleyTukey fast Fourier transform (FIT) algorithm in 1965 has opened a new area in digital signal processing by reducing the order of complexity of some crucial computational tasks like Fourier transform and convolution from N 2 to N log2 N, where N is the problem size. The development of the major algorithms (CooleyTukey and splitradix FFT, prime factor algorithm and Winograd fast Fourier transform) is reviewed. Then, an attempt is made to indicate the state of the art on the subject, showing the standing of research, open problems and implementations.
Superfast solution of real positive definite Toeplitz systems
 SIAM J. Matrix Anal. Appl
, 1988
"... Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We describe an implementation of the generalized Schur algorithm for the superfast solution of real positive definite Toeplitz systems of order n + 1, where n = 2ν. Our implementation uses the splitradix fast Fourier transform algorithms for real data of Duhamel. We are able to obtain the nth Szegő polynomial using fewer than 8n log2 2 n real arithmetic operations without explicit use of the bitreversal permutation. Since Levinson’s algorithm requires slightly more than 2n2 operations to obtain this polynomial, we achieve crossover with Levinson’s algorithm at n = 256. Key words. Toeplitz matrix, Schur’s algorithm, splitradix Fast Fourier Transform
PocketSphinx: A free, realtime continuous speech recognition system for handheld devices
 in Proceedings of ICASSP
, 2006
"... The availability of realtime continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in humancomputer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused o ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
(Show Context)
The availability of realtime continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in humancomputer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU SPHINXII, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to handheld devices. The resulting system operates in an average 0.87 times realtime on a 206MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first handheld LVCSR system available under an opensource license. 1.
A modified splitradix FFT with fewer arithmetic operations
 IEEE TRANS. SIGNAL PROCESSING
, 2006
"... ..."
Active contour external force using vector field convolution for image segmentation
 IEEE Transactions on Image Processing
"... Abstract—Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector field convolution (VFC), to address these problems. VFC is calculated by convolving the edge map generated from the image with the userdefined vector field kernel. We propose two structures for the magnitude function of the vector field kernel, and we provide an analytical method to estimate the parameter of the magnitude function. Mixed VFC is introduced to alleviate the possible leakage problem caused by choosing inappropriate parameters. We also demonstrate that the standard external force and the gradient vector flow (GVF) external force are special cases of VFC in certain scenarios. Examples and comparisons with GVF are presented in this paper to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field. Index Terms—Active contours, deformable models, external force, gradient vector flow (GVF), snakes, vector field convolution (VFC). I.
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
right notice and this permission notice are preserved on all copies.
A realtime blind source separation scheme and its application to reverberant and noisy acoustic environments
, 2006
"... ..."
EnergyAware Architectures for a RealValued FFT Implementation
 In Proceedings of the 2003 International Symposium on Low Power Electronics and Design (ISLPED 2003), Seoul, Korea, 25–27
, 2003
"... Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.