Results 1  10
of
526
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 497 (14 self)
 Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
Expander Graphs and their Applications
, 2003
"... Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . ..."
Abstract

Cited by 188 (5 self)
 Add to MetaCart
Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Derandomizing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Magical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 A Super Concentrator with O(n) edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 Derandomizing Random Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Brook for GPUs: Stream Computing on Graphics Hardware
 ACM TRANSACTIONS ON GRAPHICS
, 2004
"... In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtua ..."
Abstract

Cited by 143 (8 self)
 Add to MetaCart
In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to handwritten GPU code and up to seven times faster than their CPU counterparts.
FFTs for the 2SphereImprovements and Variations
 JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS
, 2003
"... Earlier work by Driscoll and Healy [18] has produced an efficient algorithm for computing the Fourier transform of bandlimited functions on the 2sphere. In this article we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and co ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
Earlier work by Driscoll and Healy [18] has produced an efficient algorithm for computing the Fourier transform of bandlimited functions on the 2sphere. In this article we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and consequent improved convolution algorithm for such functions. All require at most O(N log2 N)operations where N is the number of sample points. We also address implementation considerations and give heuristics for allowing reliable and computationally efficient floating point implementations of slightly modified algorithms. These claims are supported by extensive numerical experiments from our implementation in C on DEC, HP, SGI and Linux Pentium platforms. These results indicate that variations of the algorithm are both reliable and efficient for a large range of useful problem sizes. Performance appears to be architecturedependent. The article concludes with a brief discussion of a few potential applications.
Selection of a convolution function for Fourier inversion using gridding [computerised tomography application
 IEEE Trans. Medical Imaging
, 1991
"... AbstractIn fields ranging from radio astronomy to magnetic resonance imaging, Fourier inversion of data not falling on a Cartesian grid has been a prbblem. As a result, multiple algorithms have been created for reconstructing images from nonuniform frequency samples. In the technique known as gridd ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
AbstractIn fields ranging from radio astronomy to magnetic resonance imaging, Fourier inversion of data not falling on a Cartesian grid has been a prbblem. As a result, multiple algorithms have been created for reconstructing images from nonuniform frequency samples. In the technique known as gridding, the data samples are weighted for sampling density and convolved with a finite kernel, then resampled on a grid preparatory to a fast Fourier transform. This paper compares the utility of several convolution functions, including one that outperforms the “optimal ” prolate spheroidal wave function in some situations. I.
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
 IEEE Trans. on Circuits and Systems
, 1990
"... ..."
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
SPIRAL: A Generator for PlatformAdapted Libraries of Signal Processing Algorithms
 Journal of High Performance Computing and Applications
, 2004
"... SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be reoptimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical fra ..."
Abstract

Cited by 71 (20 self)
 Add to MetaCart
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be reoptimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.