Results 1 - 10
of
40
Larrabee: a many-core x86 architecture for visual computing
- In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers
, 2008
"... Abstract 123 This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector proces ..."
Abstract
-
Cited by 104 (6 self)
- Add to MetaCart
Abstract 123 This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2 nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this
Shader Algebra
, 2004
"... An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as first-class objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combinat ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as first-class objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combination concatenates the input channels, output channels, and computations of two shaders. Similar operators can be used to manipulate streams and apply computational kernels expressed as shaders to streams. Connecting a shader program to a stream applies that program to all elements of the stream; combining streams concatenates the record definitions of those streams.
Toward Acceleration of RSA Using 3D Graphics Hardware
"... Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators bec ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable however, this performance has made them an attractive target for other domains. Specifically, they have motivated the transformation of costly algorithms from a general purpose computational model into a form that executes on said graphics hardware. We investigate the implementation and performance of modular exponentiation using a graphics accelerator, with the view of using it to execute operations required in the RSA public key cryptosystem. 1
Concurrent number cruncher: a gpu implementation of a general sparse linear solver
- Int. J. Parallel Emerg. Distrib. Syst
"... A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purp ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purpose APIs such as CTM (AMD-ATI) and CUDA (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage, register blocking and vectorization), to implement a sparse general-purpose linear solver. Our implementation of the Jacobi-preconditioned Conjugate Gradient algorithm outperforms by up to a factor of 6.0x leading-edge CPU counterparts, making it attractive for applications which content with single precision.
Markovian segmentation and parameter estimation on graphics hardware
- J. Electron. Imag
, 2006
"... Département d’informatique et de recherche opérationnelle ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Département d’informatique et de recherche opérationnelle
Gpu-based cell projection for interactive volume rendering
- SIBGRAPI
, 2006
"... In this dissertation is presented a practical approach of the Projected Tetrahedra’s (PT) algorithm for interactive volume rendering of unstructured data using programmable graphics cards. Unlike similar works reported earlier, the proposed method employs two fragment shaders, one for computing the ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In this dissertation is presented a practical approach of the Projected Tetrahedra’s (PT) algorithm for interactive volume rendering of unstructured data using programmable graphics cards. Unlike similar works reported earlier, the proposed method employs two fragment shaders, one for computing the tetrahedra projections and another for rendering the volume. The proposed algorithm achieve interactive rates by storing the model in texture memory and avoiding redundant projections of the earlier implementations using vertex shaders. The algorithm is capable of rendering over 2 millions tetrahedra per second on current graphics hardware, making it competitive with recent ray casting approaches, while occupying a substantially smaller memory footprint. 1.
Vector texture maps on the GPU
, 2005
"... Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the anti-aliased text on the paper roll only uses 8 bytes per character (two RGBA texels) ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the anti-aliased text on the paper roll only uses 8 bytes per character (two RGBA texels). Performances are 54 FPS (3DLabs Wildcat Realizm 200). This paper presents VTMs (Vector Texture Maps), a novel representation of vector images that can be used as a texture by the GPU for real-time rendering. A VTM decomposes texture space into different regions, represented in an analytic way, by a set of implicit degree 3 polynomials. Each region can be rendered by a different fragment shading function. Accurate anti-aliasing is performed in real-time, based on an estimate of fragment coverage. As a consequence, infinite zooming can be applied without any pixel discretization artifact. Based on a hierarchical data structure, our representation has low memory requirements. Its versatility is demonstrated in various settings, including a font engine completely implemented in the GPU.
How to solve systems of conservation laws numerically using the graphics processor as a highperformance computational engine
- Quak (Eds.), Geometric Modelling, Numerical Simulation, and Optimization: Industrial Mathematics at SINTEF
, 2005
"... Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how these two schemes fail to give accurate approximations to solutions containing discontinuities. We then introduce a general class of semi-discrete finite-volume schemes that are designed to produce accurate resolution of both smooth and nonsmooth parts of the solution. Using this special class we wish to introduce the reader to the basic principles used to design modern high-resolution schemes. As examples of systems of conservation laws, we consider the shallow-water equations for water waves and the Euler equations for the dynamics of an ideal gas. The second theme in the paper is how programmable graphics processor units (GPUs or graphics cards) can be used to efficiently compute numerical solutions of these systems. In contrast to instruction driven micro-processors (CPUs), GPUs subscribe to the data-stream-based computing paradigm and have been optimised for high throughput of large data streams. Most modern numerical methods for hyperbolic conservation laws are explicit schemes defined over a grid, in which the unknowns at each grid point or in each grid cell can be updated independently of the others. Therefore such methods are particularly attractive for implementation using data-stream-based processing. 1
Real-time GPU silhouette refinement using adaptively blended Bézier patches
, 2007
"... We present an algorithm for detecting and extracting the silhouette edges of a triangle mesh in real time using GPUs (Graphical Processing Units). We also propose a tessellation strategy for visualizing the mesh with smooth silhouettes through a continuous blend between Bézier patches with varying l ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We present an algorithm for detecting and extracting the silhouette edges of a triangle mesh in real time using GPUs (Graphical Processing Units). We also propose a tessellation strategy for visualizing the mesh with smooth silhouettes through a continuous blend between Bézier patches with varying level of detail. Furthermore, we show how our techniques can be integrated with displacement and normal mapping. We give details on our GPU implementation and provide a performance analysis with respect to mesh size.
Implementing Ray Tracing on GPU
- Diploma thesis, University of Applied Sciences
, 2004
"... Assistant Professor: Marcus Hudritsch ..."

