Results 1 -
9 of
9
A survey of general-purpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract
-
Cited by 231 (11 self)
- Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware.
Scan Primitives for GPU Computing
- GRAPHICS HARDWARE 2007
, 2007
"... The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Us ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API.Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
Glift: Generic, efficient, random-access GPU data structures
- IN PROC. OF SIGGRAPH ’05
, 2005
"... This paper presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU data structure libraries, Glift enables GPU programmers to separate algorithms from data structure definitions; thereby greatly simplif ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
This paper presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU data structure libraries, Glift enables GPU programmers to separate algorithms from data structure definitions; thereby greatly simplifying algorithmic development and enabling reusable and interchangeable data structures. We characterize a large body of previously published GPU data structures in terms of our abstraction and present several new GPU data structures. The structures, a stack, quadtree, and octree, are explained using simple Glift concepts and implemented using reusable Glift components. We also describe two applications of these structures not previously demonstrated on GPUs: adaptive shadow maps and octree 3D paint. Lastly, we show that our example Glift data structures perform comparably to handwritten implementations while requiring only a fraction of the programming effort.
The irregular Z-buffer: Hardware acceleration for irregular data structures
, 2005
"... The classical Z-buffer visibility algorithm samples a scene at regularly spaced points on an image plane. Previously, we introduced an extension of this algorithm called the irregular Z-buffer that permits sampling of the scene from arbitrary points on the image plane. These sample points are stored ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
The classical Z-buffer visibility algorithm samples a scene at regularly spaced points on an image plane. Previously, we introduced an extension of this algorithm called the irregular Z-buffer that permits sampling of the scene from arbitrary points on the image plane. These sample points are stored in a two-dimensional spatial data structure. Here we present a set of architectural enhancements to the classical Z-buffer acceleration hardware which supports efficient execution of the irregular Z-buffer. These enhancements enable efficient parallel construction and query of certain irregular data structures, including the grid of linked lists used by our algorithm. The enhancements include flexible atomic read-modify-write units located near the memory controller, an internal routing network between these units and the fragment processors, and a MIMD fragment processor design. We simulate the performance of this new architecture and demonstrate that it can be used to render high-quality shadows in geometrically complex scenes at interactive frame rates. We also discuss other uses of the irregular Z-buffer algorithm and the implications of our architectural changes in the design of chip-multiprocessors.
Deconstructing Hardware Usage for General Purpose Computation on GPUs
"... The high-programmability and numerous compute resources on Graphics Processing Units (GPUs) have allowed researchers to dramatically accelerate many non-graphics applications. This initial success has generated great interest in mapping applications to GPUs. Accordingly, several works have focused o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The high-programmability and numerous compute resources on Graphics Processing Units (GPUs) have allowed researchers to dramatically accelerate many non-graphics applications. This initial success has generated great interest in mapping applications to GPUs. Accordingly, several works have focused on helping application developers rewrite their application kernels for the explicitly parallel but restricted GPU programming model. However, there has been far less work that examines how these applications actually utilize the underlying hardware. This paper focuses on deconstructing how General Purpose applications on GPUs (GPGPU applications) utilize the underlying GPU pipeline. The paper identifies which parts of the pipeline are utilized, how they are utilized, and why they are suitable for general purpose computation. For those parts that are underutilized, the paper examines the underlying causes for the underutilization and suggests changes that would make them more useful for GPGPU applications. Hopefully, this analysis will help designers include the most useful features when designing novel parallel hardware for general purpose applications, and help them avoid restrictions that limit the utility of the hardware. Furthermore, by highlighting the capabilities of existing GPU components, this paper should also help GPGPU developers make more efficient use of the GPU. 1.
Particle Level Set Advection for the Interactive Visualization of Unsteady 3D Flow
"... Typically, flow volumes are visualized by defining their boundary as iso-surface of a level set function. Grid-based level sets offer a good global representation but suffer from numerical diffusion of surface detail, whereas particlebased methods preserve details more accurately but introduce the p ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Typically, flow volumes are visualized by defining their boundary as iso-surface of a level set function. Grid-based level sets offer a good global representation but suffer from numerical diffusion of surface detail, whereas particlebased methods preserve details more accurately but introduce the problem of unequal global representation. The particle level set (PLS) method combines the advantages of both approaches by interchanging the information between the grid and the particles. Our work demonstrates that the PLS technique can be adapted to volumetric dye advection via streak volumes, and to the visualization by time surfaces and path volumes. We achieve this with a modified and extended PLS, including a model for dye injection. A new algorithmic interpretation of PLS is introduced to exploit the efficiency of the GPU, leading to interactive visualization. Finally, we demonstrate the high quality and usefulness of PLS flow visualization by providing quantitative results on volume preservation and by discussing typical applications of 3D flow visualization. Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling- Curve, surface, solid, and object representations
Fast Exact String Matching on the GPU
"... Abstract — We present a string-matching program that runs on the GPU. Our program, Cmatch, achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version. String matching has a long history in computational biology with roots in finding similar proteins and gene sequences ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — We present a string-matching program that runs on the GPU. Our program, Cmatch, achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version. String matching has a long history in computational biology with roots in finding similar proteins and gene sequences in a database of known sequences. The explosion in sequence data available in the 80s and 90s motivated the development of ever faster techniques for searching for similar sequences, and ultimately lead the use of parallelized execution of string matching algorithms using sophisticated data structures called suffix trees. Suffix trees can be constructed time proportional to the length of the corpus, and provide exact matching of a query in time proporional to the length of the query, independent of the size of the corpus. Here, we present our string-matching kernel for use in the Compute Unified Device Architecture, which executes parallelized searching of a suffix tree for finding exact matches for a set of query strings. We compare our GPGPU suffix tree search to a serial CPU version of the algorithm, and analogous components of the widely used CPU program MUMmer, and explore issues associated with storing a suffix tree in a graphics card’s memory, and data distribution among the GPU’s processing units. Index Terms—computational biology, GPGPU, suffix tree, string matching, data reordering. 1
UTILIZING GPUS ON CLUSTER COMPUTERS PROJECT WORK IN TDT4715 ALGORITHM CONSTRUCTION AND VISUALIZATION, DEPTH STUDY FALL 2006
"... Information Science (IDI) at the Norwegian University of Science and Technology (NTNU), Norway. The work was done over a period of four months, and was assigned by Schlumberger Limited. The main supervisor for the project was Associate Professor Dr. Anne Cathrine Elster of IDI-NTNU. The project was ..."
Abstract
- Add to MetaCart
Information Science (IDI) at the Norwegian University of Science and Technology (NTNU), Norway. The work was done over a period of four months, and was assigned by Schlumberger Limited. The main supervisor for the project was Associate Professor Dr. Anne Cathrine Elster of IDI-NTNU. The project was co-supervized by Tore Fevang of Schlumberger Limited, Trondheim. I am very grateful for the excellent support I have received from my supervisors during the project. I would like to thank Dr. Elster for her excellent support, for hooking me up with Schlumberger and this project assignment, and for always taking the time to talk with me even when she was busy preparing a 50 MNOK supercomputer. A very special thanks goes to my co-supervisor Tore Fevang of Schlumberger Trondheim, who has been supportive and helpful at a level which far exceeded everything I could have hoped for. Mr. Fevang has been very helpful throughout the entire project, always had time for questions, and always tried very hard answering all my questions. I would also like to thank Schlumberger Limited and the manager at Schlumberger Trondheim, Wolfgang Hochweller, for providing me with a great place to work and giving me

