Results 1  10
of
16
GPUAccelerated Preconditioned Iterative Linear Solvers ∗
"... This work is an overview of our preliminary experience in developing highperformance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Technique ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
This work is an overview of our preliminary experience in developing highperformance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrixvector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA C1060 show that for unstructured matrices SpMV kernels can be up to 10 times faster on the GPU than on the host Intel Xeon E5504 Processor. Overall performance of the GPUaccelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a much smaller factor, up to 3, and GPUaccelerated Incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speedup nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be significantly improved. 1
Matrixfree GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs
, 2013
"... Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in “flat ” three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in “flat ” three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction (NWP) models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and to meet operational requirements an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units (GPUs) have been shown to be highly efficient (both in terms of absolute performance and power consumption) for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. In this article we describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrixfree implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the PCG algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrixfree GPU code by comparing it both to a sequential CPU
Implementation of the Deflated Preconditioned Conjugate Gradient Method for Bubbly Flow
 Delft University of Technology
, 2010
"... Dr. ir. M.B. van Gijzen ..."
(Show Context)
A GPU Memory System Comparison for an Elliptic Test Problem
"... This paper presents GPUbased solutions to the Poisson equation with homogeneous Dirichlet boundary conditions in two spatial dimensions. This problem has wellunderstood behavior, but similar computation to many more complex realworld problems. We analyze the GPU performance using three types of m ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper presents GPUbased solutions to the Poisson equation with homogeneous Dirichlet boundary conditions in two spatial dimensions. This problem has wellunderstood behavior, but similar computation to many more complex realworld problems. We analyze the GPU performance using three types of memory access in the CUDA memory model (direct access to global memory, texture access, and shared memory). Based on data locality, different CUDA algorithms are designed to accommodate the different device memory performance behaviors. We present a performance study on the speedup of our GPUbased solutions on an NVIDIA Tesla C2070 over serial code. By relating the data access pattern and its spatial locality, our results show that an algorithm using global memory with coalesced reads outperforms the other memory systems and allows effective solvers using single precision floating points. 1.
Efficient TwoLevel Preconditioned Conjugate Gradient Method on the GPU
"... Abstract. We present an implementation of a TwoLevel Precondi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present an implementation of a TwoLevel Precondi
SparseMatrixCGSolver in CUDA
"... This paper describes the implementation of a parallelized conjugate gradient solver for linear equation systems using CUDAC. Given a real, symmetric and positive definite coefficient matrix and a righthand side, the parallized cgsolver is able to find a solution for that system by exploiting the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper describes the implementation of a parallelized conjugate gradient solver for linear equation systems using CUDAC. Given a real, symmetric and positive definite coefficient matrix and a righthand side, the parallized cgsolver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations and that algorithm we achieve a speed up from 4 to 7 depending on the dimension of the coefficient matrix. Additionally the concept of preconditioners to decrease the time to find a solution is evaluated using the SSOR method. In the end additional suggestions are provided to further increase the speed of the presented CUDA cgsolver.
Petascale elliptic solvers for anisotropic pdes on gpu clusters, CoRR abs/1402.3545
"... ar ..."
(Show Context)
Preconditioned Conjugate Gradient Solver
"... This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan or sublicensing, systematic supply or distribution in any form to anyone is expressly forbidden. ..."
Abstract
 Add to MetaCart
This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan or sublicensing, systematic supply or distribution in any form to anyone is expressly forbidden.
Flow Simulation and Visualization
"... Fig. 1: GPUbased NavierStokes simulation [3] with interactive manipulation of the boundary conditions like solid obstacles, in and outflow conditions, and the viscosity of the fluid. The flow direction is from left to right and the obstacle leads to a Kármán vortex street. (a) Visualization of ..."
Abstract
 Add to MetaCart
(Show Context)
Fig. 1: GPUbased NavierStokes simulation [3] with interactive manipulation of the boundary conditions like solid obstacles, in and outflow conditions, and the viscosity of the fluid. The flow direction is from left to right and the obstacle leads to a Kármán vortex street. (a) Visualization of the pressure field (red means high, blue means low) combined with particle tracers. (b) Visualization of the finitetime Lyapunov exponent (blue means high, white means low) using backwardtime integration. Abstract—This report summarizes my research on realtime flow simulation and visualization. In particular, my research is concentrated on efficient GPU algorithms that can exploit the finegranular thread parallelism of graphics hardware. Several established data structures and numerical solvers perform poorly on the GPU. For this reason, I study and develop new methods that are optimized for the underlying hardware architecture. In Eulerian flow simulations, the grid resolution plays an important role to capture turbulence effects on a small scale. To efficiently exploit the valuable computation and memory resources of a GPU, my work contributes an algorithm for dynamic grid refinement to interactively simulate and render smoke animations on the GPU. One of the most computationally expensive steps of a typical NavierStokes simulation is the numerical solution of the pressure Poisson equation. In this field, my work contributes a preconditioner for the conjugate gradient method that is optimized for the Poisson problem and for efficient GPU processing. In subsequent work, I exploited the gained performance benefit to spend more computation time for advanced visualization methods that can help control the flow by interactively manipulating boundary conditions and receive immediate feedback by visualizing Lagrangian coherent structures in realtime. For texturebased flow visualization, SemiLagrangian advection is often employed, which is susceptible to numerical diffusion; however, higherorder interpolation methods can be employed to reduce this effect. For this reason, I contributed an evaluation paper that studies the conservation of the frequency sprectrum for different interpolation methods.
Preconditioned Conjugate Gradient Method
, 2015
"... This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan or sublicensing, systematic supply or distribution in any form to anyone is expressly forbidden. ..."
Abstract
 Add to MetaCart
This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan or sublicensing, systematic supply or distribution in any form to anyone is expressly forbidden.