Results 1 - 10
of
131
A survey of general-purpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract
-
Cited by 231 (11 self)
- Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware.
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
- ACM Trans. Graph
, 2003
"... Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given ..."
Abstract
-
Cited by 174 (2 self)
- Add to MetaCart
Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission
Brook for GPUs: Stream Computing on Graphics Hardware
- ACM TRANSACTIONS ON GRAPHICS
, 2004
"... In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtua ..."
Abstract
-
Cited by 114 (7 self)
- Add to MetaCart
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
Photon Mapping on Programmable Graphics Hardware
- GRAPHICS HARDWARE
, 2003
"... We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadth-first photon tracing to distribute photons using the GPU. The photons are stored in a grid-based photon map that is constructed directly on the graphics hardware using one of two met ..."
Abstract
-
Cited by 112 (4 self)
- Add to MetaCart
We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadth-first photon tracing to distribute photons using the GPU. The photons are stored in a grid-based photon map that is constructed directly on the graphics hardware using one of two methods: the first method is a multipass technique that uses fragment programs to directly sort the photons into a compact grid. The second method uses a single rendering pass combining a vertex program and the stencil buffer to route photons to their respective grid cells, producing an approximate photon map. We also present an efficient method for locating the nearest photons in the grid, which makes it possible to compute an estimate of the radiance at any surface location in the scene. Finally, we describe a breadth-first stochastic ray tracer that uses the photon map to simulate full global illumination directly on the graphics hardware. Our implementation demonstrates that current graphics hardware is capable of fully simulating global illumination with progressive, interactive feedback to the user.
Scan Primitives for GPU Computing
- GRAPHICS HARDWARE 2007
, 2007
"... The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Us ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
UberFlow: A GPU-Based Particle Engine
, 2004
"... We present a system for real-time animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation an ..."
Abstract
-
Cited by 66 (3 self)
- Add to MetaCart
We present a system for real-time animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation and rendering of particle data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform particle manipulation are essential. Our system implements a versatile particle engine, including inter-particle collisions and visibility sorting. By combining memory objects with floating-point fragment programs, we have implemented a particle engine that entirely avoids the transfer of particle data at run-time. Our system can be seen as a forerunner of a new class of graphics algorithms, exploiting memory objects or similar concepts on upcoming graphics hardware to avoid bus bandwidth becoming the major performance bottleneck.
Fast Computation of Database Operations Using Graphics Processors
- Proc. of ACM SIGMOD
, 2004
"... We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical databa ..."
Abstract
-
Cited by 61 (14 self)
- Add to MetaCart
We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semi-linear queries on attributes and executing database operations e#ciently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an e#ective co-processor for performing database operations.
Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication
, 2004
"... Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.
Lu-gpu: Efficient algorithms for solving dense linear systems on graphics hardware
- in SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing
, 2005
"... We present a novel algorithm to solve dense linear systems using graphics processors (GPUs). We reduce matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the com ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
We present a novel algorithm to solve dense linear systems using graphics processors (GPUs). We reduce matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. We also use appropriate data representations to match the rasterization order and cache technology of graphics processors. We have implemented our algorithm on different GPUs and compared the performance with optimized CPU implementations. In particular, our implementation on a NVIDIA GeForce 7800 GPU outperforms a CPU-based ATLAS implementation. Moreover, our results show that our algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. We use our algorithm for fluid flow simulation and demonstrate that the commodity GPU is a useful co-processor for many scientific applications. 1
Interactive Deformation and Visualization of Level Set Surfaces Using Graphics Hardware
- In IEEE Visualization
, 2003
"... Deformable isosurfaces, implemented with level-set methods, have demonstrated a great potential in visualization for applications such as segmentation, surface processing, and surface reconstruction. Their usefulness has been limited, however, by two problems. First, 3D level sets are relatively ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
Deformable isosurfaces, implemented with level-set methods, have demonstrated a great potential in visualization for applications such as segmentation, surface processing, and surface reconstruction. Their usefulness has been limited, however, by two problems. First, 3D level sets are relatively slow to compute. Second, their formulation usually entails several free parameters that can be dicult to tune correctly for speci c applications. The second problem is compounded by the rst. This paper presents a solution to these challenges by describing graphics processor (GPU) based algorithms for solving and visualizing level-set solutions at interactive rates. Our ecient GPUbased solution relies on packing the level-set isosurface data into a dynamic, sparse texture format. As the level set moves, this sparse data structure is updated via a novel GPU to CPU message passing scheme. When the level-set solver is integrated with a real-time volume renderer operating on the same packed format, a user can visualize and steer the deformable level-set surface as it evolves. In addition, the resulting isosurface can serve as a region-of-interest speci er for the volume renderer. This paper demonstrates the capabilities of this technology for interactive volume visualization and segmentation.

