Results 1  10
of
109
Larrabee: a manycore x86 architecture for visual computing
 In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers
, 2008
"... Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector proces ..."
Abstract

Cited by 267 (12 self)
 Add to MetaCart
Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than outoforder CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent ondie 2 nd level cache allows efficient interprocessor communication and highbandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this
Concurrent number cruncher: a gpu implementation of a general sparse linear solver
 Int. J. Parallel Emerg. Distrib. Syst
"... A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purp ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purpose APIs such as CTM (AMDATI) and CUDA (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage, register blocking and vectorization), to implement a sparse generalpurpose linear solver. Our implementation of the Jacobipreconditioned Conjugate Gradient algorithm outperforms by up to a factor of 6.0x leadingedge CPU counterparts, making it attractive for applications which content with single precision.
Toward Acceleration of RSA Using 3D Graphics Hardware
"... Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators bec ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable however, this performance has made them an attractive target for other domains. Specifically, they have motivated the transformation of costly algorithms from a general purpose computational model into a form that executes on said graphics hardware. We investigate the implementation and performance of modular exponentiation using a graphics accelerator, with the view of using it to execute operations required in the RSA public key cryptosystem. 1
Shader Algebra
, 2004
"... An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as firstclass objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combinat ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
An algebra consists of a set of objects and a set of operators that act on those objects. We treat shader programs as firstclass objects and define two operators: connection and combination. Connection is functional composition: the outputs of one shader are fed into the inputs of another. Combination concatenates the input channels, output channels, and computations of two shaders. Similar operators can be used to manipulate streams and apply computational kernels expressed as shaders to streams. Connecting a shader program to a stream applies that program to all elements of the stream; combining streams concatenates the record definitions of those streams.
A graphics processing unit implementation of the particle filter
 in Proceedings of the 15th European Statistical Signal Processing Conference (EUSIPCO ’07
, 2007
"... Modern graphics cards for computers, and especially their graphics processing units (GPUs), are designed for fast rendering of graphics. In order to achieve this GPUs are equipped with a parallel architecture which can be exploited for generalpurpose computing on GPU (GPGPU) as a complement to th ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Modern graphics cards for computers, and especially their graphics processing units (GPUs), are designed for fast rendering of graphics. In order to achieve this GPUs are equipped with a parallel architecture which can be exploited for generalpurpose computing on GPU (GPGPU) as a complement to the central processing unit (CPU). In this paper GPGPU techniques are used to make a parallel GPU implementation of stateoftheart recursive Bayesian estimation using particle filters (PF). The modifications made to obtain a parallel particle filter, especially for the resampling step, are discussed and the performance of the resulting GPU implementation is compared to one achieved with a traditional CPU implementation. The resulting GPU filter is faster with the same accuracy as the CPU filter for many particles, and it shows how the particle filter can be parallelized. 1.
GPU Random Numbers via the Tiny Encryption Algorithm
, 2010
"... Random numbers are extensively used on the GPU. As more computation is ported to the GPU, it can no longer be treated as rendering hardware alone. Random number generators (RNG) are expected to cater general purpose and graphics applications alike. Such diversity adds to expected requirements of a R ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Random numbers are extensively used on the GPU. As more computation is ported to the GPU, it can no longer be treated as rendering hardware alone. Random number generators (RNG) are expected to cater general purpose and graphics applications alike. Such diversity adds to expected requirements of a RNG. A good GPU RNG should be able to provide repeatability, random access, multiple independent streams, speed, and random numbers free from detectable statistical bias. A specific application may require some if not all of the above characteristics at one time. In particular, we hypothesize that not all algorithms need the highestquality random numbers, so a good GPU RNG should provide a speed quality tradeoff that can be tuned for fast low quality or slower high quality random numbers. We propose that the Tiny Encryption Algorithm satisfies all of the requirements of a good GPU Pseudo Random Number Generator. We compare our technique against previous approaches, and present an evaluation using standard randomness test suites as well as Perlin noise and a MonteCarlo shadow algorithm. We show that the quality of random number generation directly affects the quality of the noise produced, however, good quality noise can still be produced with a lower quality random number generator.
Markovian segmentation and parameter estimation on graphics hardware
 J. Electron. Imag
, 2006
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 11 (9 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
Towards utilizing GPUs in information visualization: A model and implementation of imagespace operations
 IEEE TVCG
, 2009
"... Modern programmable GPUs represent a vast potential in terms of performance and visual flexibility for information visualization research, but surprisingly few applications even begin to utilize this potential. In this paper, we conjecture that this may be due to the mismatch between the highlevel ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Modern programmable GPUs represent a vast potential in terms of performance and visual flexibility for information visualization research, but surprisingly few applications even begin to utilize this potential. In this paper, we conjecture that this may be due to the mismatch between the highlevel abstract data types commonly visualized in our field, and the lowlevel floatingpoint model supported by current GPU shader languages. To help remedy this situation, we present a refinement of the traditional information visualization pipeline that is amenable to implementation using GPU shaders. The refinement consists of a final imagespace step in the pipeline where the multivariate data of the visualization is sampled in the resolution of the current view. To concretize the theoretical aspects of this work, we also present a visual programming environment for constructing visualization shaders using a simple draganddrop interface. Finally, we give some examples of the use of shaders for wellknown visualization techniques.
How to solve systems of conservation laws numerically using the graphics processor as a highperformance computational engine
 Quak (Eds.), Geometric Modelling, Numerical Simulation, and Optimization: Industrial Mathematics at SINTEF
, 2005
"... Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how the ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Summary. The paper has two main themes: The first theme is to give the reader an introduction to modern methods for systems of conservation laws. To this end, we start by introducing two classical schemes, the Lax–Friedrichs scheme and the Lax–Wendroff scheme. Using a simple example, we show how these two schemes fail to give accurate approximations to solutions containing discontinuities. We then introduce a general class of semidiscrete finitevolume schemes that are designed to produce accurate resolution of both smooth and nonsmooth parts of the solution. Using this special class we wish to introduce the reader to the basic principles used to design modern highresolution schemes. As examples of systems of conservation laws, we consider the shallowwater equations for water waves and the Euler equations for the dynamics of an ideal gas. The second theme in the paper is how programmable graphics processor units (GPUs or graphics cards) can be used to efficiently compute numerical solutions of these systems. In contrast to instruction driven microprocessors (CPUs), GPUs subscribe to the datastreambased computing paradigm and have been optimised for high throughput of large data streams. Most modern numerical methods for hyperbolic conservation laws are explicit schemes defined over a grid, in which the unknowns at each grid point or in each grid cell can be updated independently of the others. Therefore such methods are particularly attractive for implementation using datastreambased processing. 1
Vector texture maps on the GPU
, 2005
"... Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the antialiased text on the paper roll only uses 8 bytes per character (two RGBA texels) ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Figure 1: Vector Texture Maps applied onto an object. The ACM logo is represented by four gradient shaders, combined by VTMs. By indexing characters in a font represented by a compressed VTM (that uses 128 KB), the antialiased text on the paper roll only uses 8 bytes per character (two RGBA texels). Performances are 54 FPS (3DLabs Wildcat Realizm 200). This paper presents VTMs (Vector Texture Maps), a novel representation of vector images that can be used as a texture by the GPU for realtime rendering. A VTM decomposes texture space into different regions, represented in an analytic way, by a set of implicit degree 3 polynomials. Each region can be rendered by a different fragment shading function. Accurate antialiasing is performed in realtime, based on an estimate of fragment coverage. As a consequence, infinite zooming can be applied without any pixel discretization artifact. Based on a hierarchical data structure, our representation has low memory requirements. Its versatility is demonstrated in various settings, including a font engine completely implemented in the GPU.