Results 1  10
of
60
Larrabee: a manycore x86 architecture for visual computing
 In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers
, 2008
"... Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector proces ..."
Abstract

Cited by 149 (8 self)
 Add to MetaCart
Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than outoforder CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent ondie 2 nd level cache allows efficient interprocessor communication and highbandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this
Toward Acceleration of RSA Using 3D Graphics Hardware
"... Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators bec ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable however, this performance has made them an attractive target for other domains. Specifically, they have motivated the transformation of costly algorithms from a general purpose computational model into a form that executes on said graphics hardware. We investigate the implementation and performance of modular exponentiation using a graphics accelerator, with the view of using it to execute operations required in the RSA public key cryptosystem. 1
Concurrent number cruncher: a gpu implementation of a general sparse linear solver
 Int. J. Parallel Emerg. Distrib. Syst
"... A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purp ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purpose APIs such as CTM (AMDATI) and CUDA (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage, register blocking and vectorization), to implement a sparse generalpurpose linear solver. Our implementation of the Jacobipreconditioned Conjugate Gradient algorithm outperforms by up to a factor of 6.0x leadingedge CPU counterparts, making it attractive for applications which content with single precision.
Shader Algebra
, 2004
"... An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as firstclass objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combinat ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as firstclass objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combination concatenates the input channels, output channels, and computations of two shaders. Similar operators can be used to manipulate streams and apply computational kernels expressed as shaders to streams. Connecting a shader program to a stream applies that program to all elements of the stream; combining streams concatenates the record definitions of those streams.
Markovian segmentation and parameter estimation on graphics hardware
 J. Electron. Imag
, 2006
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
Département d’informatique et de recherche opérationnelle
Gpubased cell projection for interactive volume rendering
 SIBGRAPI
, 2006
"... In this dissertation is presented a practical approach of the Projected Tetrahedra’s (PT) algorithm for interactive volume rendering of unstructured data using programmable graphics cards. Unlike similar works reported earlier, the proposed method employs two fragment shaders, one for computing the ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
In this dissertation is presented a practical approach of the Projected Tetrahedra’s (PT) algorithm for interactive volume rendering of unstructured data using programmable graphics cards. Unlike similar works reported earlier, the proposed method employs two fragment shaders, one for computing the tetrahedra projections and another for rendering the volume. The proposed algorithm achieve interactive rates by storing the model in texture memory and avoiding redundant projections of the earlier implementations using vertex shaders. The algorithm is capable of rendering over 2 millions tetrahedra per second on current graphics hardware, making it competitive with recent ray casting approaches, while occupying a substantially smaller memory footprint. 1.
Vector texture maps on the GPU
, 2005
"... Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the antialiased text on the paper roll only uses 8 bytes per character (two RGBA texels) ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the antialiased text on the paper roll only uses 8 bytes per character (two RGBA texels). Performances are 54 FPS (3DLabs Wildcat Realizm 200). This paper presents VTMs (Vector Texture Maps), a novel representation of vector images that can be used as a texture by the GPU for realtime rendering. A VTM decomposes texture space into different regions, represented in an analytic way, by a set of implicit degree 3 polynomials. Each region can be rendered by a different fragment shading function. Accurate antialiasing is performed in realtime, based on an estimate of fragment coverage. As a consequence, infinite zooming can be applied without any pixel discretization artifact. Based on a hierarchical data structure, our representation has low memory requirements. Its versatility is demonstrated in various settings, including a font engine completely implemented in the GPU.
Towards utilizing GPUs in information visualization: A model and implementation of imagespace operations
 IEEE TVCG
, 2009
"... Modern programmable GPUs represent a vast potential in terms of performance and visual flexibility for information visualization research, but surprisingly few applications even begin to utilize this potential. In this paper, we conjecture that this may be due to the mismatch between the highlevel ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Modern programmable GPUs represent a vast potential in terms of performance and visual flexibility for information visualization research, but surprisingly few applications even begin to utilize this potential. In this paper, we conjecture that this may be due to the mismatch between the highlevel abstract data types commonly visualized in our field, and the lowlevel floatingpoint model supported by current GPU shader languages. To help remedy this situation, we present a refinement of the traditional information visualization pipeline that is amenable to implementation using GPU shaders. The refinement consists of a final imagespace step in the pipeline where the multivariate data of the visualization is sampled in the resolution of the current view. To concretize the theoretical aspects of this work, we also present a visual programming environment for constructing visualization shaders using a simple draganddrop interface. Finally, we give some examples of the use of shaders for wellknown visualization techniques.
How to solve systems of conservation laws numerically using the graphics processor as a highperformance computational engine
 Quak (Eds.), Geometric Modelling, Numerical Simulation, and Optimization: Industrial Mathematics at SINTEF
, 2005
"... Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how the ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how these two schemes fail to give accurate approximations to solutions containing discontinuities. We then introduce a general class of semidiscrete finitevolume schemes that are designed to produce accurate resolution of both smooth and nonsmooth parts of the solution. Using this special class we wish to introduce the reader to the basic principles used to design modern highresolution schemes. As examples of systems of conservation laws, we consider the shallowwater equations for water waves and the Euler equations for the dynamics of an ideal gas. The second theme in the paper is how programmable graphics processor units (GPUs or graphics cards) can be used to efficiently compute numerical solutions of these systems. In contrast to instruction driven microprocessors (CPUs), GPUs subscribe to the datastreambased computing paradigm and have been optimised for high throughput of large data streams. Most modern numerical methods for hyperbolic conservation laws are explicit schemes defined over a grid, in which the unknowns at each grid point or in each grid cell can be updated independently of the others. Therefore such methods are particularly attractive for implementation using datastreambased processing. 1
Implementing Ray Tracing on GPU
 Diploma thesis, University of Applied Sciences
, 2004
"... Assistant Professor: Marcus Hudritsch ..."