Results 1 
7 of
7
Performance and accuracy of hardwareoriented native, emulated and mixedprecision solvers in FEM simulations
 International Journal of Parallel, Emergent and Distributed Systems
"... In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs) [1]. In particul ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs) [1]. In particular, we evaluated mixed and emulatedprecision schemes on commodity graphics processors (GPUs), which at that time only supported computations in single precision. With the advent of graphics cards that natively provide double precision, this report updates our previous results. We demonstrate that with new coprocessor hardware supporting native double precision, such as NVIDIA’s G200 architecture, the situation does not change qualitatively for PDEs, and the previously introduced mixed precision schemes are still preferable to double precision alone. But the schemes achieve significant quantitative performance improvements with the more powerful hardware. In particular, we demonstrate that a Multigrid scheme can accurately solve a common test problem in Finite Element settings with one million unknowns in less than 0.1 seconds, which is truely outstanding performance. We support these conclusions by exploring the algorithmic design space enlarged by the availability of double precision directly in the hardware. 1 Introduction and
Concurrent number cruncher: a gpu implementation of a general sparse linear solver
 Int. J. Parallel Emerg. Distrib. Syst
"... A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purp ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of nonzero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purpose APIs such as CTM (AMDATI) and CUDA (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage, register blocking and vectorization), to implement a sparse generalpurpose linear solver. Our implementation of the Jacobipreconditioned Conjugate Gradient algorithm outperforms by up to a factor of 6.0x leadingedge CPU counterparts, making it attractive for applications which content with single precision.
Using mixed precision for sparse matrix computations to enhance the performance while achieving 64bit accuracy
 ACM Trans. Math. Softw
"... By using a combination of 32bit and 64bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techni ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
By using a combination of 32bit and 64bit floating point arithmetic the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as
Exploring reconfigurable architectures for explicit finite difference option pricing models
 in Int. Conf. on Field Programmable Logic and Applications, 2009
"... This paper explores the application of reconfigurable hardware and Graphics Processing Units (GPUs) to the acceleration of financial computation using the finite difference (FD) method. A parallel pipelined architecture has been developed to support concurrent valuation of independent options with h ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
This paper explores the application of reconfigurable hardware and Graphics Processing Units (GPUs) to the acceleration of financial computation using the finite difference (FD) method. A parallel pipelined architecture has been developed to support concurrent valuation of independent options with high pricing throughput. Our FPGA implementation running at 106MHz on an xc4vlx160 device demonstrates a speed up of 12 times over a Pentium 4 processor at 3.6GHz in singleprecision arithmetic; while the FPGA is 3.6 times slower than a Tesla C1060 240Core GPU at 1.3GHz, it is 9 times more energy efficient. 1.
Mixed precision methods for convergent iterative schemes
 EDGE
, 2006
"... Most error estimates of numerical schemes are derived in the field of real or complex numbers. From a computational point of view this assumes infinite precision. For the implementation on a computer, the infinite number fields are quantized into a finite set of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Most error estimates of numerical schemes are derived in the field of real or complex numbers. From a computational point of view this assumes infinite precision. For the implementation on a computer, the infinite number fields are quantized into a finite set of
Analysis of Field Programmable Gate ArrayBased Kalman Filter Architectures
, 2010
"... All Rights Reservediii ..."
RealNumber Optimisation: A Speculative, ProfileGuided Approach
, 2007
"... From supercomputers for computational science to embedded processors in mobile phones, most important computing applications manipulate the set of real numbers, R. How these numbers are represented varies, with embedded applications picking fixedpoint formats compatible with integer operations and ..."
Abstract
 Add to MetaCart
From supercomputers for computational science to embedded processors in mobile phones, most important computing applications manipulate the set of real numbers, R. How these numbers are represented varies, with embedded applications picking fixedpoint formats compatible with integer operations and larger machines using IEEE754 floating point or a close variant. A large body of work describes methods for optimising floating point representations using static analysis techniques, however these must always take a conservative approach if they intend to ensure correctness. Taking our inspiration from work on speculative execution and profileguided compiler optimisations, we lay out a series of tools and techniques to produce optimised realnumber representations. Our speculative approach aims for greater reductions in hardware area and execution time than with more conservative approaches, while providing fallback options to ensure correctness in case of incorrect speculation. We describe a profiling tool for x86 binaries which reveals bucketised value ranges for floatingpoint operations within applications. A selection of profiling results for realworld scientific