• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The Fastest Fourier Transform in the West (1997)

by M Frigo, S G Johnson
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 40
Next 10 →

Automated empirical optimizations of software and the ATLAS project

by R. Clint Whaley, Antoine Petitet, Jackj Dongarra - Parallel Computing , 2001
"... This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software ..."
Abstract - Cited by 233 (31 self) - Add to MetaCart
This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software �AEOS); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moore's Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the basic linear algebra subprograms �BLAS), a widely used, performance-critical,

A Fast Fourier Transform Compiler

by Matteo Frigo , 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract - Cited by 129 (5 self) - Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.

Similarity search over time series data using wavelets

by Ivan Popivanov - In ICDE , 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract - Cited by 50 (0 self) - Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also bi-orthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for time-series data. We include several wavelets that outperform both the Haar wavelet and the best known non-wavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.

Virtual Radios

by Vanu Bose, Mike Ismert, Matt Welborn, John Guttag , 1998
"... Conventional software radios take advantage of vastly improved A/D converters and DSP hardware. Our approach, which we refer to as virtual radios, also depends upon high performance A/D converters. However, rather than use DSPs, we have chosen to ride the curve of rapidly improving workstation hardw ..."
Abstract - Cited by 39 (3 self) - Add to MetaCart
Conventional software radios take advantage of vastly improved A/D converters and DSP hardware. Our approach, which we refer to as virtual radios, also depends upon high performance A/D converters. However, rather than use DSPs, we have chosen to ride the curve of rapidly improving workstation hardware. We use wideband digitization and then perform all of the digital signal processing in user space on a general purpose workstation. This approach allows us to experiment with new approaches to signal processing that exploit the hardware and software resources of the workstation. Furthermore, it allows us to experiment with different ways of structuring systems in which the radio component of communication devices are integrated with higher-level applications. This paper describes the design and performance of an environment we have constructed that facilitates building virtual radios and of two applications built using that environment. The environment consists of an I/O subsystem that p...

Architecture-Cognizant Divide and Conquer Algorithms

by Kang Su Gatlin, Larry Carter , 1999
"... Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecture-cognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecture-cognizant algorithm has functionall ..."
Abstract - Cited by 26 (5 self) - Add to MetaCart
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecture-cognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecture-cognizant algorithm has functionallyequivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires (v h ) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to (h c ) (where c is typically a small constant) experiments for a class of architecture-cognizant programs. We verify our technique on two kernels (matrix multiply and 2-D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared...

Memory Characteristics of Iterative Methods

by Christian Weiss, Wolfgang Karl, Markus Kowarschik, Ulrich Rüde , 1999
"... Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that th ..."
Abstract - Cited by 21 (9 self) - Add to MetaCart
Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the current DRAM technology cannot provide the data fast enough to keep the CPU busy. Although the fundamentals of cache optimizations are quite simple, current compilers cannot optimize even elementary iterative schemes. In this paper, we analyze the memory and cache behavior of iterative methods with extensive profiling and describe program transformation techniques to improve the cache performance of two- and three-dimensional multigrid algorithms. 1 Introduction Multigrid methods [11, 5] are among the most attractive algorithms for the solution of large sparse systems of equations that arise in the solution of elliptic partial differential equations (PDEs). However, even simple multi...

Scheduling Threads for Low Space Requirement and Good Locality

by Girija J. Narlikar - In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA , 1999
"... The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overh ..."
Abstract - Cited by 17 (1 self) - Add to MetaCart
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its p...

An Adaptive Software Library for Fast Fourier Transforms

by Dragan Mirkovic, Rishad Mahasoom - In Proceedings of the International Conference on Supercomputing , 2000
"... In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at run-time by selec ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at run-time by selecting the fastest strategy among all possible strategies, given available codelets, for a given transform size. We also presentanefficient automatic method of generating the library modules by using a special--purpose compiler. The code generator is written in C and it generates a library of C codelets. The code generator is shown to be flexible and extensible and the entire library can be generated in a matter of seconds. Wehaveevaluated the library for performance on the IBM--SP2, SGI--2000, HP--Exemplar and Intel Pentium systems. We use the results from these evaluations to build performance models for the FFT library on different platforms. The library is shown to be portable, adaptive and efficient. 1.

Adaptive Use of Iterative Methods in Predictor-Corrector Interior Point Methods for Linear Programming

by Weichung Wang, Dianne P. O'leary - Numerical Algorithms , 1999
"... this paper we develop an adaptive algorithm that changes strategy over the course of the interior point algorithm. It determines dynamically whether the preconditioner should be held constant, updated, or recomputed, and it switches to a direct method when it predicts that an iterative method will b ..."
Abstract - Cited by 13 (4 self) - Add to MetaCart
this paper we develop an adaptive algorithm that changes strategy over the course of the interior point algorithm. It determines dynamically whether the preconditioner should be held constant, updated, or recomputed, and it switches to a direct method when it predicts that an iterative method will be too expensive. In our experiments, we use a preconditioned conjugate gradient iteration on the linear system involving the matrix ADA

Uniform Frequency Images: Adding Geometry to Images to Produce Space-Efficient Textures

by Adam Hunter, Jonathan D Cohen - IEEE Visualization , 2000
"... : We discuss the concept of uniform frequency images, which exhibit uniform local frequency properties. Such images make optimal use of space when sampled close to their Nyquist limit. A warping function may be applied to an arbitrary image to redistribute its local frequency content, reducing its ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
: We discuss the concept of uniform frequency images, which exhibit uniform local frequency properties. Such images make optimal use of space when sampled close to their Nyquist limit. A warping function may be applied to an arbitrary image to redistribute its local frequency content, reducing its highest frequencies and increasing its lowest frequencies in order to approach this uniform frequency ideal. The warped image may then be downsampled according to its new, reduced Nyquist limit, thereby reducing its storage requirements. To reconstruct the original image, the inverse warp is applied. We present a general, top-down algorithm to automatically generate a piecewise-linear warping function with this frequency balancing property for a given input image. The image size is reduced by applying the warp and then downsampling. We store this warped, downsampled image plus a small number of polygons with texture coordinates to describe the inverse warp. The original image is later recon...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University