Results 1 - 10
of
164
Larrabee: a many-core x86 architecture for visual computing
- In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers
, 2008
"... Abstract 123 This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector proces ..."
Abstract
-
Cited by 279 (12 self)
- Add to MetaCart
Abstract 123 This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2 nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this
Understanding the Efficiency of Ray Traversal on GPUs
"... We discuss the mapping of elementary ray tracing operations— acceleration structure traversal and primitive intersection—onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods hav ..."
Abstract
-
Cited by 119 (8 self)
- Add to MetaCart
We discuss the mapping of elementary ray tracing operations— acceleration structure traversal and primitive intersection—onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5–2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.
Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies
- ACM Transactions on Graphics
, 2006
"... The most significant deficiency of most of today’s interactive ray tracers is that they are restricted to static walkthroughs. This restriction is due to the static nature of the acceleration structures used. While the best reported frame rates for static geometric models have been achieved using ca ..."
Abstract
-
Cited by 115 (21 self)
- Add to MetaCart
The most significant deficiency of most of today’s interactive ray tracers is that they are restricted to static walkthroughs. This restriction is due to the static nature of the acceleration structures used. While the best reported frame rates for static geometric models have been achieved using carefully constructed kd-trees, this article shows that bounding volume hierarchies (BVHs) can be used to efficiently ray trace large static models. More importantly, the BVH can be used to ray trace deformable models (sets of triangles whose positions change over time) with little loss of performance. A variety of efficiency techniques are used to achieve this performance, but three algorithmic changes to the typical BVH algorithm are mainly responsible. First, the BVH is built using a variant of the surface area heuristic conventionally used to build kd-trees. Second, the topology of the BVH is not changed over time so that only the bounding volumes need to be refit from frame-to-frame. Third, and most importantly, packets of rays are traced together through the BVH using a novel integrated packet-frustum traversal scheme. This traversal scheme elegantly combines the advantages of both packet traversal and frustum traversal and allows for rapid hierarchy descent for packets that hit bounding volumes as well as rapid exits for packets that miss. A BVH-based ray tracing system using these techniques is shown to achieve performance for deformable models comparable to that previously available only for static models.
Ray Tracing Animated Scenes Using Coherent Grid Traversal
- Proceedings of ACM SIGGRAPH
"... model (78K triangles). c) Animated wind-up toys (11K triangles) walking and jumping incoherently around each other. d) A rigid-body dynamics simulation of marbles (8.8K triangles). e) A complex scene of 174K animated triangles, where a fairy and a dragonfly dance through an animated forest. Scenes a ..."
Abstract
-
Cited by 104 (25 self)
- Add to MetaCart
model (78K triangles). c) Animated wind-up toys (11K triangles) walking and jumping incoherently around each other. d) A rigid-body dynamics simulation of marbles (8.8K triangles). e) A complex scene of 174K animated triangles, where a fairy and a dragonfly dance through an animated forest. Scenes are rebuilt from scratch every frame, allowing fully dynamic animation. Including shading, texturing, and hard shadows, as used in the above images, we can render these scenes at 1024 × 1024 pixels with 15.3, 7.8, 10.2, 26.2, and 1.4 frames per second on a dual 3.2 GHz Xeon. Excluding shading, texturing, and shadows, we achieve 34.5, 15.8, 29.3, 57.1, and 3.4 frames per second. We present a new approach to interactive ray tracing of moderatesized animated scenes based on traversing frustum-bounded packets of coherent rays through uniform grids. By incrementally computing the overlap of the frustum with a slice of grid cells, we accelerate grid traversal by more than a factor of 10, and achieve ray tracing performance competitive with the fastest known packet-based kd-tree ray tracers. The ability to efficiently rebuild the grid on every frame enables this performance even for fully dynamic scenes that typically challenge interactive ray tracing systems. 1 Introduction and Related
OptiX: A General Purpose Ray Tracing Engine
"... Figure 1: Images from various applications built with OptiX. Top: Physically based light transport through path tracing. Bottom: Ray tracing of a procedural Julia set, photon mapping, large-scale line of sight and collision detection, Whitted-style ray tracing of dynamic geometry, and ray traced amb ..."
Abstract
-
Cited by 87 (3 self)
- Add to MetaCart
Figure 1: Images from various applications built with OptiX. Top: Physically based light transport through path tracing. Bottom: Ray tracing of a procedural Julia set, photon mapping, large-scale line of sight and collision detection, Whitted-style ray tracing of dynamic geometry, and ray traced ambient occlusion. All applications are interactive. The NVIDIA ® OptiX ™ ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom ray tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal. This enables the implementation of a highly diverse set of ray tracing-based algorithms and applications, including interactive rendering, offline rendering, collision detection systems, artificial intelligence queries, and scientific simulations such as sound propagation. OptiX achieves high performance through a compact object model and application of several ray tracing-specific compiler optimizations. For ease of use it exposes a single-ray programming model with full support for recursion and a dynamic dispatch mechanism similar to virtual function calls.
Real-time kd-tree construction on graphics hardware
- ACM Transactions on Graphics
, 2008
"... We present an algorithm for constructing kd-trees on GPUs. This algorithm achieves real-time performance by exploiting the GPU’s streaming architecture at all stages of kd-tree construction. Unlike previous parallel kd-tree algorithms, our method builds tree nodes completely in BFS (breadth-first se ..."
Abstract
-
Cited by 82 (8 self)
- Add to MetaCart
We present an algorithm for constructing kd-trees on GPUs. This algorithm achieves real-time performance by exploiting the GPU’s streaming architecture at all stages of kd-tree construction. Unlike previous parallel kd-tree algorithms, our method builds tree nodes completely in BFS (breadth-first search) order. We also develop a special strategy for large nodes at upper tree levels so as to further exploit the fine-grained parallelism of GPUs. For these nodes, we parallelize the computation over all geometric primitives instead of nodes at each level. Finally, in order to maintain kd-tree quality, we introduce novel schemes for fast evaluation of node split costs. As far as we know, ours is the first real-time kd-tree algorithm on the GPU. The kd-trees built by our algorithm are of comparable quality as those constructed by off-line CPU algorithms. In terms of speed, our algorithm is significantly faster than well-optimized single-core CPU algorithms and competitive with multi-core CPU algorithms. Our algorithm provides a general way for handling dynamic scenes on the GPU. We demonstrate the potential of our algorithm in applications involving dynamic scenes, including GPU ray tracing, interactive photon mapping, and point cloud modeling.
RT-DEFORM: Interactive Ray Tracing of Dynamic Scenes using BVHs
- In Proceedings of the 2006 IEEE Symposium on Interactive Ray Tracing
, 2006
"... Figure 1: Dress simulation: Four different images of a 210 step sequence taken from a dynamic cloth simulation and consisting of 40K triangles. By updating in real-time instead of rebuilding the BVH of the deforming model according to our heuristic, we are able to render the animation at 13 frames p ..."
Abstract
-
Cited by 70 (16 self)
- Add to MetaCart
(Show Context)
Figure 1: Dress simulation: Four different images of a 210 step sequence taken from a dynamic cloth simulation and consisting of 40K triangles. By updating in real-time instead of rebuilding the BVH of the deforming model according to our heuristic, we are able to render the animation at 13 frames per second with 512 2 screen resolution using a dual-core P4 processor at 2.8 GHz. We present an efficient approach for interactive ray tracing of deformable or animated models. Unlike many of the recent approaches for ray tracing static scenes, we use bounding volume hierarchies (BVHs) instead of kd-trees as the underlying acceleration structure. Our algorithm makes no assumptions about the simulation or the motion of objects in the scene and dynamically updates or recomputes the BVHs. We also describe a method to detect BVH quality degradation during the simulation in order to determine when the hierarchy needs to be rebuilt. Furthermore, we show that the ray coherence techniques introduced for kd-trees can be naturally extended to BVHs and yield similar improvements. Finally, we compare BVHs to spatial kd-trees, which have been used recently as a replacement for AABB hierarchies. Our algorithm has been applied to different scenarios arising in animation and simulation and consisting of tens of thousands to a million triangles. In practice, our system can ray trace these models at 3-13 frames a second on a desktop PC including secondary rays.
On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)
- IN PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON INTERACTIVE RAY TRACING
, 2006
"... Though a large variety of efficiency structures for ray tracing exist, kd-trees today seem to slowly become the method of choice. In particular, kd-trees built with cost estimation functions such as a surface area heuristic (SAH) seem to be important for reaching high performance. Unfortunately, mos ..."
Abstract
-
Cited by 67 (11 self)
- Add to MetaCart
(Show Context)
Though a large variety of efficiency structures for ray tracing exist, kd-trees today seem to slowly become the method of choice. In particular, kd-trees built with cost estimation functions such as a surface area heuristic (SAH) seem to be important for reaching high performance. Unfortunately, most algorithms for building such trees have a time complexity of O(N log² N), or even O(N²). In this paper, we analyze the state of the art in building good kd-trees for ray tracing, and eventually propose an algorithm that builds SAH kd-trees in O(N logN), the theoretical lower bound.
Instant Ray Tracing: The Bounding Interval Hierarchy
- IN RENDERING TECHNIQUES 2006 – PROCEEDINGS OF THE 17TH EUROGRAPHICS SYMPOSIUM ON RENDERING
, 2006
"... We introduce a new ray tracing algorithm that exploits the best of previous methods: Similar to bounding volume hierarchies the memory of the acceleration data structure is linear in the number of objects to be ray traced and can be predicted prior to construction, while the traversal of the hiera ..."
Abstract
-
Cited by 62 (2 self)
- Add to MetaCart
(Show Context)
We introduce a new ray tracing algorithm that exploits the best of previous methods: Similar to bounding volume hierarchies the memory of the acceleration data structure is linear in the number of objects to be ray traced and can be predicted prior to construction, while the traversal of the hierarchy is as efficient as the one of kd-trees. The construction algorithm can be considered a variant of quicksort and for the first time is based on a global space partitioning heuristic, which is much cheaper to evaluate than the classic surface area heuristic. Compared to spatial partitioning schemes only a fraction of the memory is used and a higher numerical precision is intrinsic. The new method is simple to implement and its high performance is demonstrated by extensive measurements including massive as well as dynamic scenes, where we focus on the total time to image including the construction cost rather than on only frames per second.
Design for Parallel Interactive Ray Tracing Systems
- in: Proceedings of IEEE Symposium on Interactive Ray Tracing
, 2006
"... Figure 1: Images generated using interactive ray tracing. From left to right, time step 225 of a Richtmyer-Meshkov instability simulation from ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
(Show Context)
Figure 1: Images generated using interactive ray tracing. From left to right, time step 225 of a Richtmyer-Meshkov instability simulation from