Results 1  10
of
321
A survey of generalpurpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract

Cited by 554 (15 self)
 Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping generalpurpose computation to graphics hardware. We begin with the technical motivations that underlie generalpurpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping generalpurpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in generalpurpose application development on graphics hardware.
Random walks for image segmentation
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... A novel method is proposed for performing multilabel, interactive image segmentation. Given a small number of pixels with userdefined (or predefined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach one of the ..."
Abstract

Cited by 387 (21 self)
 Add to MetaCart
(Show Context)
A novel method is proposed for performing multilabel, interactive image segmentation. Given a small number of pixels with userdefined (or predefined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach one of the prelabeled pixels. By assigning each pixel to the label for which the greatest probability is calculated, a highquality image segmentation may be obtained. Theoretical properties of this algorithm are developed along with the corresponding connections to discrete potential theory and electrical circuits. This algorithm is formulated in discrete space (i.e., on a graph) using combinatorial analogues of standard operators and principles from continuous potential theory, allowing it to be applied in arbitrary dimension on arbitrary graphs.
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
 ACM Trans. Graph
, 2003
"... Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given ..."
Abstract

Cited by 297 (3 self)
 Add to MetaCart
Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission
Brook for GPUs: Stream Computing on Graphics Hardware
 ACM TRANSACTIONS ON GRAPHICS
, 2004
"... In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtua ..."
Abstract

Cited by 224 (9 self)
 Add to MetaCart
In this paper, we present Brook for GPUs, a system for generalpurpose computation on programmable graphics hardware. Brook extends C to include simple dataparallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to handwritten GPU code and up to seven times faster than their CPU counterparts.
Scan Primitives for GPU Computing
 GRAPHICS HARDWARE 2007
, 2007
"... The scan primitives are powerful, generalpurpose dataparallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Us ..."
Abstract

Cited by 173 (9 self)
 Add to MetaCart
The scan primitives are powerful, generalpurpose dataparallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrixvector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallowwater fluid simulation using the scan framework for a tridiagonal matrix solver.
Photon Mapping on Programmable Graphics Hardware
 GRAPHICS HARDWARE
, 2003
"... We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadthfirst photon tracing to distribute photons using the GPU. The photons are stored in a gridbased photon map that is constructed directly on the graphics hardware using one of two met ..."
Abstract

Cited by 151 (4 self)
 Add to MetaCart
We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadthfirst photon tracing to distribute photons using the GPU. The photons are stored in a gridbased photon map that is constructed directly on the graphics hardware using one of two methods: the first method is a multipass technique that uses fragment programs to directly sort the photons into a compact grid. The second method uses a single rendering pass combining a vertex program and the stencil buffer to route photons to their respective grid cells, producing an approximate photon map. We also present an efficient method for locating the nearest photons in the grid, which makes it possible to compute an estimate of the radiance at any surface location in the scene. Finally, we describe a breadthfirst stochastic ray tracer that uses the photon map to simulate full global illumination directly on the graphics hardware. Our implementation demonstrates that current graphics hardware is capable of fully simulating global illumination with progressive, interactive feedback to the user.
Accelerating large graph algorithms on the GPU using CUDA
"... Abstract. Graph algorithms are fundamental to many disciplines and application areas. Large graphs involving millions of vertices are common in scientific and engineering applications. Practicaltime implementations using highend computing resources have been reported but are accessible only to a f ..."
Abstract

Cited by 149 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Graph algorithms are fundamental to many disciplines and application areas. Large graphs involving millions of vertices are common in scientific and engineering applications. Practicaltime implementations using highend computing resources have been reported but are accessible only to a few. Graphics Processing Units (GPUs) are fast emerging as inexpensive parallel processors due to their high computation power and low price. The G80 line of Nvidia GPUs provides the CUDA programming model that treats the GPU as a SIMD processor array. We present a few fundamental algorithms using this programming model on large graphs. We report results on breadth first search, single source shortest path, and allpairs shortest path. Our implementation is able to compute single source shortest paths on a graph with 10 million vertices in about 1.5 seconds using the Nvidia 8800GTX GPU costing $600. The GPU can then be used as an inexpensive coprocessor to accelerate parts of the applications. 1
Fast Computation of Database Operations using Graphics Processors
"... We present new algorithms on commodity graphics processors to perform fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semilinear queries, which are essential computational components of typical database, ..."
Abstract

Cited by 117 (12 self)
 Add to MetaCart
(Show Context)
We present new algorithms on commodity graphics processors to perform fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semilinear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semilinear queries on attributes and executing database operations efficiently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA’s GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPUbased algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an effective coprocessor for performing database operations.
UberFlow: A GPUBased Particle Engine
, 2004
"... We present a system for realtime animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation an ..."
Abstract

Cited by 108 (3 self)
 Add to MetaCart
We present a system for realtime animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation and rendering of particle data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform particle manipulation are essential. Our system implements a versatile particle engine, including interparticle collisions and visibility sorting. By combining memory objects with floatingpoint fragment programs, we have implemented a particle engine that entirely avoids the transfer of particle data at runtime. Our system can be seen as a forerunner of a new class of graphics algorithms, exploiting memory objects or similar concepts on upcoming graphics hardware to avoid bus bandwidth becoming the major performance bottleneck.
Understanding the Efficiency of GPU Algorithms for MatrixMatrix Multiplication
, 2004
"... Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model&a ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an indepth analysis of dense matrixmatrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrixmatrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cacheaware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.