Results 1 
8 of
8
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
right notice and this permission notice are preserved on all copies.
Exploiting symmetry on parallel architectures
, 1995
"... This thesis describes techniques for the design of parallel programs that solvewellstructured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a groupequivariant matrix. Fast techniques for this multiplication are described ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This thesis describes techniques for the design of parallel programs that solvewellstructured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a groupequivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over nite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetryexploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral groupequivariant matrix is described. This code runs faster than previous serial programs, and discovered a number of results. Second, parallel algorithms for Fourier transforms for nite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct nbody problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Multilinear Algebra and Chess Endgames
 of No Chance: Combinatorial Games at MRSI
, 1996
"... Abstract. This article has three chief aims: (1) To show the wide utility of multilinear algebraic formalism for highperformance computing. (2) To describe an application of this formalism in the analysis of chess endgames, and results obtained thereby that would have been impossible to compute usi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. This article has three chief aims: (1) To show the wide utility of multilinear algebraic formalism for highperformance computing. (2) To describe an application of this formalism in the analysis of chess endgames, and results obtained thereby that would have been impossible to compute using earlier techniques, including a win requiring a record 243 moves. (3) To contribute to the study of the history of chess endgames, by focusing on the work of Friedrich Amelung (in particular his apparently lost analysis of certain sixpiece endgames) and that of Theodor Molien, one of the founders of modern group representation theory and the first person to have systematically numerically analyzed a pawnless endgame. 1.
An Equivariant Fast Fourier Transform Algorithm
"... This paper presents a generalization of the CooleyTukey fast Fourier transform algorithm that respects group symmetries. The algorithm, when applied to a function invariant under a group of symmetries, fully exploits these symmetries to reduce both the number of arithmetic operations and the amount ..."
Abstract
 Add to MetaCart
This paper presents a generalization of the CooleyTukey fast Fourier transform algorithm that respects group symmetries. The algorithm, when applied to a function invariant under a group of symmetries, fully exploits these symmetries to reduce both the number of arithmetic operations and the amount of memory used. The symmetries accommodated by the algorithm include all of the crystallographic groups. These groups arise in crystallographic structure analysis, which was the motivating application for the algorithm In this paper, it is shown that a generalization of the CooleyTukey Fast Fourier Transform (FFT), presented by the authors in [1] can be modified to take advantage of a wide class of symmetries in the data, including crystallographic group symmetries, to produce a fast (N log N) algorithm that fully
Abstract The
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract
 Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called ���������� � , that outputs C code. Written in Objective Caml, ���������� � can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, ���������� � “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool. 1