Results 1 - 10
of
54
A parallel algorithm for construction of uniform grids
- In HPG ’09: Proceedings of the Conference on High Performance Graphics 2009
, 2009
"... We present a fast, parallel GPU algorithm for construction of uni-form grids for ray tracing, which we implement in CUDA. The al-gorithm performance does not depend on the primitive distribution, because we reduce the problem to sorting pairs of primitives and cell indices. Our implementation is abl ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
We present a fast, parallel GPU algorithm for construction of uni-form grids for ray tracing, which we implement in CUDA. The al-gorithm performance does not depend on the primitive distribution, because we reduce the problem to sorting pairs of primitives and cell indices. Our implementation is able to take full advantage of the parallel architecture of the GPU, and construction speed is faster than CPU algorithms running on multiple cores. Its scalability and robustness make it superior to alternative approaches, especially for scenes with complex primitive distributions.
Micropolygon ray tracing with defocus and motion blur
"... (b) Motion blur + defocus Figure 1: A car rendered with defocus, motion blur, mirror reflection and ambient occlusion at 1280 × 720 resolution with 23 × 23 supersampling. The scene is tessellated into 48.9M micropolygons (i.e., 53.1 micropolygons per pixel). The blurred image is rendered in 4 minute ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(b) Motion blur + defocus Figure 1: A car rendered with defocus, motion blur, mirror reflection and ambient occlusion at 1280 × 720 resolution with 23 × 23 supersampling. The scene is tessellated into 48.9M micropolygons (i.e., 53.1 micropolygons per pixel). The blurred image is rendered in 4 minutes on an NVIDIA GTX 285 GPU. The image rendered in perfect focus takes 2 minutes and is provided to help the reader to assess the defocus and motion blur effects. We present a micropolygon ray tracing algorithm that is capable of efficiently rendering high quality defocus and motion blur effects. A key component of our algorithm is a BVH (bounding volume hierarchy) based on 4D hyper-trapezoids that project into 3D OBBs (oriented bounding boxes) in spatial dimensions. This acceleration structure is able to provide tight bounding volumes for scene geometries, and is thus efficient in pruning intersection tests during ray traversal. More importantly, it can exploit the natural coherence on the time dimension in motion blurred scenes. The structure can be quickly constructed by utilizing the micropolygon grids generated during micropolygon tessellation. Ray tracing of defocused and motion blurred scenes is efficiently performed by traversing the structure. Both the BVH construction and ray traversal are easily implemented on GPUs and integrated into a GPU-based micropolygon renderer. In our experiments, our ray tracer performs up to an order of magnitude faster than the state-of-art rasterizers while consistently delivering an image quality equivalent to a maximumquality rasterizer. We also demonstrate that the ray tracing algorithm can be extended to handle a variety of effects, such as secondary ray effects and transparency.
Efficient Stream Compaction on Wide SIMD Many-Core Architectures
"... Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction algorithm and explore several variations thereof. Our algorithm is designed to maximize concurrent execution, with minimal use of synchronization. Bandwidth and auxiliary storage requirements are reduced significantly, which allows for substantially better performance. We have tested our algorithms using CUDA on a PC with an NVIDIA GeForce GTX280 GPU. On this hardware, our reference implementation provides a 3 × speedup over previous published algorithms.
Memory–scalable gpu spatial hierarchy construction
, 2009
"... Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the BFS (breadth-first search) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, w ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the BFS (breadth-first search) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the PBFS (partial breadth-first search) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core BVH (bounding volume hierarchy) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration’s use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20M triangles, several times larger than previous GPU algorithms. memory bound, kd-tree, bounding volume hierarchy
g-Planner: Real-time motion planning and global navigation using GPUs
- in Proceedings of AAAI Conference on Artificial Intelligence
, 2010
"... We present novel randomized algorithms for solving global motion planning problems that exploit the computational ca-pabilities of many-core GPUs. Our approach uses thread and data parallelism to achieve high performance for all com-ponents of sample-based algorithms, including random sam-pling, nea ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
We present novel randomized algorithms for solving global motion planning problems that exploit the computational ca-pabilities of many-core GPUs. Our approach uses thread and data parallelism to achieve high performance for all com-ponents of sample-based algorithms, including random sam-pling, nearest neighbor computation, local planning, collision queries and graph search. This approach can efficiently solve both the multi-query and single-query versions of the prob-lem and obtain considerable speedups over prior CPU-based algorithms. We demonstrate the efficiency of our algorithms by applying them to a number of 6DOF planning benchmarks in 3D environments. Overall, this is the first algorithm that can perform real-time motion planning and global navigation using commodity hardware.
SLUSALLEK P.: Object Partitioning Considered Harmful: Space Subdivision for BVHs
- In Proceedings of the Conference on High Performance Graphics 2009
, 2009
"... A major factor for the efficiency of ray tracing is the use of good acceleration structures. Recently, bounding volume hierarchies (BVHs) have become the preferred acceleration structures, due to their competitive performance and greater flexibility compared to KD trees. In this paper, we present a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A major factor for the efficiency of ray tracing is the use of good acceleration structures. Recently, bounding volume hierarchies (BVHs) have become the preferred acceleration structures, due to their competitive performance and greater flexibility compared to KD trees. In this paper, we present a study on algorithms for the construction of optimal BVHs. Due to the exponential nature of the problem, constructing optimal BVHs for ray tracing remains an open topic. By exploiting the lin-earity of the surface area heuristic (SAH), we develop an algorithm that can find optimal partitions in polynomial time. We further gen-eralize this algorithm and show that every SAH-based KD tree or BVH construction algorithm is a special case of the generic algo-rithm. Based on a number of experiments with the generic algorithm, we conclude that the assumption of non-terminating rays in the surface area cost model becomes a major obstacle for using the full po-tential of BVHs. We also observe that enforcing space subdivision helps to improve BVH performance. Finally, we develop a simple space partitioning algorithm for building efficient BVHs.
A Nearest Neighbor Data Structure for Graphics Hardware
"... Nearest neighbor search is a core computational task in database systems and throughout data analysis. It is also a major computational bottleneck, and hence an enormous body of research has been devoted to data structures and algorithms for accelerating the task. Recent advances in graphics hardwar ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Nearest neighbor search is a core computational task in database systems and throughout data analysis. It is also a major computational bottleneck, and hence an enormous body of research has been devoted to data structures and algorithms for accelerating the task. Recent advances in graphics hardware provide tantalizing speedups on a variety of tasks and suggest an alternate approach to the problem: simply run brute force search on a massively parallel system. In this paper we marry the approaches with a novel data structure that can effectively make use of parallel systems such as graphics cards. The architectural complexities of graphics hardware—the high degree of parallelism, the small amount of memory relative to instruction throughput, and the single instruction, multiple data design—present significant challenges for data structure design. Furthermore, the brute force approach applies perfectly to graphics hardware, leading one to question whether an intelligent algorithm or data structure can even hope to outperform this basic approach. Despite these challenges and misgivings, we demonstrate that our data structure—termed a Random Ball Cover—provides significant speedups over the GPUbased brute force approach. 1.
Efficient ray traced soft shadows using multi-frusta tracing
- In High Performance Graphics 2009
"... Ray tracing has long been considered to be superior to rasterization because its ability to trace arbitrary rays allows for simulating virtually any physical light transport effect by just tracing rays. Yet, to look plausible, extraordinary amounts of rays for effects such as soft shadows are typica ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Ray tracing has long been considered to be superior to rasterization because its ability to trace arbitrary rays allows for simulating virtually any physical light transport effect by just tracing rays. Yet, to look plausible, extraordinary amounts of rays for effects such as soft shadows are typically required. This makes the prospects of real-time performance rather remote. Rasterization, in contrast, has a record of producing such effects in real-time through employing specialized and approximate solutions for individual effects. Though ray tracing may still be the right choice for effects like reflections and refractions, using specialized solutions for certain important effects also makes sense for a ray tracer. In this paper, we propose a special solution to ray trace soft shadows that is particularly targeted for Intel’s Larrabee architecture. We use a specialized frustum tracing that traces multiple frusta of specialized “light-weight ” shadow packets in parallel, while generating rays within each frustum on demand. The technique can easily be integrated into any packet ray tracer, and fits well into the wide SIMD and cache-size constraints of the Larrabee architecture. Our technique allows to reach rates of up to several dozen million rays per second per Larrabee core, outperforming traditional packet techniques by up to 6×. This high performance combined with a simple light-weight illumination filtering step allows to achieve real-time soft shadows for game-like scenes. 1
Collision-streams: Fast GPU-based collision detection for deformable models
- In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
, 2011
"... into a funnel and pass through it under the pressure of a ball. This model has 47K vertices, 92K triangles, and a lot of self-collisions. Our novel GPU-based CCD algorithm takes 4:4ms and 10ms per frame to compute all the collisions on a NVIDIA GeForce GTX 480 and a NVIDIA GeForce GTX 285, respectiv ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
into a funnel and pass through it under the pressure of a ball. This model has 47K vertices, 92K triangles, and a lot of self-collisions. Our novel GPU-based CCD algorithm takes 4:4ms and 10ms per frame to compute all the collisions on a NVIDIA GeForce GTX 480 and a NVIDIA GeForce GTX 285, respectively. We present a fast GPU-based streaming algorithm to perform col-lision queries between deformable models. Our approach is based on hierarchical culling and reduces the computation to generating different streams. We present a novel stream registration method to compact the streams and efficiently compute the potentially col-liding pairs of primitives. We also use a deferred front tracking method to lower the memory overhead. The overall algorithm has been implemented on different GPUs and we have evaluated its per-formance on non-rigid and deformable simulations. We highlight our speedups over prior CPU-based and GPU-based algorithms. In practice, our algorithm can perform inter-object and intra-object computations on models composed of hundreds of thousands of tri-angles in tens of milliseconds. 1
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies
- In Proceedings of HPG 2013, ACM SIGGRAPH/Eurographics
"... We propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M tri-angles/sec on NVIDIA GTX Titan. We also ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M tri-angles/sec on NVIDIA GTX Titan. We also propose an improved approach for parallel splitting of triangles prior to tree construc-tion. Averaged over 20 test scenes, the resulting trees offer over 90 % of the ray tracing performance of the best offline construction method (SBVH), while previous fast GPU algorithms offer only about 50%. Compared to state-of-the-art, our method offers a sig-nificant improvement in the majority of practical workloads that need to construct the BVH for each frame. On the average, it gives the best overall performance when tracing between 7 million and 60 billion rays per frame. This covers most interactive applications, product and architectural design, and even movie rendering.