Results 1 - 10
of
317
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract
-
Cited by 471 (14 self)
- Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
Brook for GPUs: Stream Computing on Graphics Hardware
- ACM TRANSACTIONS ON GRAPHICS
, 2004
"... In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtua ..."
Abstract
-
Cited by 114 (7 self)
- Add to MetaCart
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
Expander Graphs and their Applications
, 2003
"... Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . ..."
Abstract
-
Cited by 113 (4 self)
- Add to MetaCart
Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 De-randomizing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Magical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 A Super Concentrator with O(n) edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 De-randomizing Random Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FFTs for the 2-Sphere - Improvements and Variations
- The Journal of Fourier Analysis and Applications
, 1996
"... Earlier work by Driscoll and Healy [16] has produced an e#cient algorithm for computing the Fourier transform of band-limited functions on the 2-sphere. In this paper we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and consequ ..."
Abstract
-
Cited by 81 (2 self)
- Add to MetaCart
Earlier work by Driscoll and Healy [16] has produced an e#cient algorithm for computing the Fourier transform of band-limited functions on the 2-sphere. In this paper we present a reformulation and variation of the original algorithm which results in a greatly improved inverse transform, and consequent improved convolution algorithm for such functions. All require at most O(N log 2 N ) operations where N is the number of sample points. We also address implementation considerations and give heuristics for allowing reliable and computationally e#cient floating point implementations of slightly modified algorithms. These claims are supported by extensive numerical experiments from our implementation in C on DEC, HP and SGI platforms. These results indicate that variations of the algorithm are both reliable and e#cient for a large range of useful problem sizes. Performance appears to be architecture-dependent. The paper concludes with a brief discussion of a few potential applications. 1...
Special Purpose Parallel Computing
- Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
- IEEE Trans. on Circuits and Systems
, 1990
"... ..."
SPIRAL: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
- Journal of High Performance Computing and Applications
, 2004
"... SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical fra ..."
Abstract
-
Cited by 62 (18 self)
- Add to MetaCart
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations
Powerlist: a structure for parallel recursion
- ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
Generalized FFTs -- A Survey Of Some Recent Results
, 1995
"... In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
In this paper we survey some recent work directed towards generalizing the fast Fourier transform (FFT). We work primarily from the point of view of group representation theory. In this setting the classical FFT can be viewed as a family of efficient algorithms for computing the Fourier transform of either a function defined on a finite abelian group, or a bandlimited function on a compact abelian group. We discuss generalizations of the FFT to arbitrary finite groups and compact Lie groups.
Fast parallel circuits for the quantum Fourier transform
- PROCEEDINGS 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS’00)
, 2000
"... We give new bounds on the circuit complexity of the quantum Fourier transform (QFT). We give an upper bound of O(log n + log log(1/ε)) on the circuit depth for computing an approximation of the QFT with respect to the modulus 2 n with error bounded by ε. Thus, even for exponentially small error, our ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
We give new bounds on the circuit complexity of the quantum Fourier transform (QFT). We give an upper bound of O(log n + log log(1/ε)) on the circuit depth for computing an approximation of the QFT with respect to the modulus 2 n with error bounded by ε. Thus, even for exponentially small error, our circuits have depth O(log n). The best previous depth bound was O(n), even for approximations with constant error. Moreover, our circuits have size O(n log(n/ε)). We also give an upper bound of O(n(log n) 2 log log n) on the circuit size of the exact QFT modulo 2 n, for which the best previous bound was O(n 2). As an application of the above depth bound, we show that Shor’s factoring algorithm may be based on quantum circuits with depth only O(log n) and polynomial-size, in combination with classical polynomial-time pre- and post-processing. In the language of computational complexity, this implies that factoring is in the complexity class ZPP BQNC, where BQNC is the class of problems computable with bounded-error probability by quantum circuits with polylogarithmic depth and polynomial size. Finally, we prove an Ω(log n) lower bound on the depth complexity of approximations of the

