Results 1  10
of
129
A survey of generalpurpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract

Cited by 488 (18 self)
 Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping generalpurpose computation to graphics hardware. We begin with the technical motivations that underlie generalpurpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping generalpurpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in generalpurpose application development on graphics hardware.
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
 ACM Trans. Graph
, 2003
"... Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given ..."
Abstract

Cited by 275 (3 self)
 Add to MetaCart
Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission
Brook for GPUs: Stream Computing on Graphics Hardware
 ACM TRANSACTIONS ON GRAPHICS
, 2004
"... In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtua ..."
Abstract

Cited by 194 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to handwritten GPU code and up to seven times faster than their CPU counterparts.
Photon Mapping on Programmable Graphics Hardware
 GRAPHICS HARDWARE
, 2003
"... We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadthfirst photon tracing to distribute photons using the GPU. The photons are stored in a gridbased photon map that is constructed directly on the graphics hardware using one of two met ..."
Abstract

Cited by 145 (4 self)
 Add to MetaCart
We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadthfirst photon tracing to distribute photons using the GPU. The photons are stored in a gridbased photon map that is constructed directly on the graphics hardware using one of two methods: the first method is a multipass technique that uses fragment programs to directly sort the photons into a compact grid. The second method uses a single rendering pass combining a vertex program and the stencil buffer to route photons to their respective grid cells, producing an approximate photon map. We also present an efficient method for locating the nearest photons in the grid, which makes it possible to compute an estimate of the radiance at any surface location in the scene. Finally, we describe a breadthfirst stochastic ray tracer that uses the photon map to simulate full global illumination directly on the graphics hardware. Our implementation demonstrates that current graphics hardware is capable of fully simulating global illumination with progressive, interactive feedback to the user.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware
 In Graphics Hardware 2003
, 2003
"... We present a method for using programmable graphics hardware to solve a variety of boundary value problems. The timeevolution of such problems is frequently governed by partial differential equations, which are used to describe a wide range of dynamic phenomena including heat transfer and fluid mec ..."
Abstract

Cited by 113 (3 self)
 Add to MetaCart
(Show Context)
We present a method for using programmable graphics hardware to solve a variety of boundary value problems. The timeevolution of such problems is frequently governed by partial differential equations, which are used to describe a wide range of dynamic phenomena including heat transfer and fluid mechanics. The need to solve these equations efficiently arises in many areas of computational science. Finite difference methods are commonly used for solving partial differential equations; we show that this approach can be mapped onto a modern graphics processor. We demonstrate an implementation of the multigrid method, a fast and popular approach to solving boundary value problems, on two modern graphics architectures. Our initial tests with available hardware show speedups of roughly 15x compared to traditional software implementation. This work presents a novel use of computer hardware and raises the intriguing possibility that we can make the inexpensive power of modern commodity graphics hardware accessible to and useful for the simulation commuuity.
PhysicallyBased Visual Simulation on Graphics Hardware
, 2002
"... In this paper, we present a method for realtime visual simulation of diverse dynamic phenomena using programmable graphics hardware. The simulations we implement use an extension of cellular automata known as the coupled map lattice (CML). CML represents the state of a dynamic system as continuous ..."
Abstract

Cited by 99 (5 self)
 Add to MetaCart
In this paper, we present a method for realtime visual simulation of diverse dynamic phenomena using programmable graphics hardware. The simulations we implement use an extension of cellular automata known as the coupled map lattice (CML). CML represents the state of a dynamic system as continuous values on a discrete lattice. In our implementation we store the lattice values in a texture, and use pixellevel programming to implement simple nextstate computations on lattice nodes and their neighbors. We apply these computations successively to produce interactive visual simulations of convection, reactiondiffusion, and boiling. We have built an interactive framework for building and experimenting with CML simulations running on graphics hardware, and have integrated them into interactive 3D graphics applications.
GPU cluster for high performance computing
 Proceedings of ACM/IEEE Supercomputing Conference
, 2004
"... Inspired by the attractive Flops/dollar ratio and the incredible growth in the speed of modern graphics processing units (GPUs), we propose to use a cluster of GPUs for high performance scientific computing. As an example application, we have developed a parallel flow simulation using the lattice Bo ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
Inspired by the attractive Flops/dollar ratio and the incredible growth in the speed of modern graphics processing units (GPUs), we propose to use a cluster of GPUs for high performance scientific computing. As an example application, we have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, our simulation can compute a 480x400x80 LBM in 0.31 second/step, a speed which is 4.6 times faster than that of our CPU cluster implementation. Besides the LBM, we also discuss other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM.
RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing
 ACM Trans. Graph
, 2005
"... with shadows and refractions), a Conference room (5.5 fps, without shadows), reflective and refractive SpheresRT in an office (4.5 fps), and UT2003 a scene from a current computer game (7.5 fps, precomputed illumination). Recursive ray tracing is a simple yet powerful and general approach for accur ..."
Abstract

Cited by 88 (4 self)
 Add to MetaCart
with shadows and refractions), a Conference room (5.5 fps, without shadows), reflective and refractive SpheresRT in an office (4.5 fps), and UT2003 a scene from a current computer game (7.5 fps, precomputed illumination). Recursive ray tracing is a simple yet powerful and general approach for accurately computing global light transport and rendering high quality images. While recent algorithmic improvements and optimized parallel software implementations have increased ray tracing performance to realtime levels, no compact and programmable hardware solution has been available yet. This paper describes the architecture and a prototype implementation of a single chip, fully programmable Ray Processing Unit (RPU). It combines the flexibility of general purpose CPUs with the efficiency of current GPUs for data parallel computations. This design allows for realtime ray tracing of dynamic scenes with programmable material, geometry, and illumination shaders. Although, running at only 66 MHz the prototype FPGA implementation already renders images at up to 20 frames per second, which in many cases beats the performance of highly optimized software running on multiGHz desktop CPUs. The performance and efficiency of the proposed architecture is analyzed using a variety of benchmark scenes.
Realtime kdtree construction on graphics hardware
 ACM Transactions on Graphics
, 2008
"... We present an algorithm for constructing kdtrees on GPUs. This algorithm achieves realtime performance by exploiting the GPU’s streaming architecture at all stages of kdtree construction. Unlike previous parallel kdtree algorithms, our method builds tree nodes completely in BFS (breadthfirst se ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
We present an algorithm for constructing kdtrees on GPUs. This algorithm achieves realtime performance by exploiting the GPU’s streaming architecture at all stages of kdtree construction. Unlike previous parallel kdtree algorithms, our method builds tree nodes completely in BFS (breadthfirst search) order. We also develop a special strategy for large nodes at upper tree levels so as to further exploit the finegrained parallelism of GPUs. For these nodes, we parallelize the computation over all geometric primitives instead of nodes at each level. Finally, in order to maintain kdtree quality, we introduce novel schemes for fast evaluation of node split costs. As far as we know, ours is the first realtime kdtree algorithm on the GPU. The kdtrees built by our algorithm are of comparable quality as those constructed by offline CPU algorithms. In terms of speed, our algorithm is significantly faster than welloptimized singlecore CPU algorithms and competitive with multicore CPU algorithms. Our algorithm provides a general way for handling dynamic scenes on the GPU. We demonstrate the potential of our algorithm in applications involving dynamic scenes, including GPU ray tracing, interactive photon mapping, and point cloud modeling.
Ray Tracing on Programmable Graphics Hardware
, 2002
"... Recently a breakthrough has occurred in graphics hardware: fixed function pipelines have been replaced with programmable vertex and fragment processors. In the near future, the graphics pipeline is likely to evolve into a general programmable stream processor capable of more than simply feedforward ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
Recently a breakthrough has occurred in graphics hardware: fixed function pipelines have been replaced with programmable vertex and fragment processors. In the near future, the graphics pipeline is likely to evolve into a general programmable stream processor capable of more than simply feedforward triangle rendering.