Results 1 -
4 of
4
Using GPUs to improve multigrid solver performance on a cluster
- J. OF COMPUTATIONAL SCIENCE AND ENGINEERING
, 2008
"... This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requirin ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code. Because of their excellent price performance ratio, we demonstrate the viability of our approach by using commodity graphics processors (GPUs) as efficient multigrid preconditioners. We address the issue of limited precision on GPUs by applying a mixed precision, iterative refinement technique. Other restrictions are also handled by a close interplay between the GPU and CPU. From a software perspective, we integrate the GPU solvers into the existing MPI-based Finite Element package by implementing the same interfaces as the CPU solvers, so that for the application programmer they are easily interchangeable. Our results show that we do not compromise any software functionality and gain speedups of two and more for large problems. Equipped with this additional option of hardware acceleration we compare different choices in increasing the performance of a conventional, commodity based cluster by increasing the number
Understanding Software Approaches for GPGPU Reliability
"... Even though graphics processors (GPUs) are becoming increasingly popular for general purpose computing, current (and likely near future) generations of GPUs do not provide hardware support for detecting soft/hard errors in computation logic or memory storage cells since graphics applications are inh ..."
Abstract
- Add to MetaCart
Even though graphics processors (GPUs) are becoming increasingly popular for general purpose computing, current (and likely near future) generations of GPUs do not provide hardware support for detecting soft/hard errors in computation logic or memory storage cells since graphics applications are inherently fault tolerant. As a result, if an error occurs in GPUs during program execution, the results could be silently corrupted, which is not acceptable for general purpose computations. To improve the fidelity of general purpose computation on GPUs (GPGPU), we investigate software approaches to perform redundant execution. In particular, we propose and study three different, application-level techniques. The first technique simply executes the GPU kernel program twice, and thus achieves roughly half of the throughput of a non-redundant execution. The next two techniques interleave redundant execution with the original code in different ways to take advantage of the parallelism between the original code and its redundant copy. Furthermore, we evaluate the benefits of providing hardware support, including ECC/parity protection to on-chip and off-chip memories, for each of the software techniques. Interestingly, our findings, based on six commonly used applications, indicate that the benefits of complex software approaches are both application and architecture dependent. The simple approach, which executes the kernel twice, is often sufficient and may even outperform the complex ones. Moreover, we argue that the cost is not justified to protect memories with ECC/parity bits.
CUDASA: Compute Unified Device and Systems Architecture
"... We present an extension to the CUDA programming language which extends parallelism to multi-GPU systems and GPU-cluster environments. Following the existing model, which exposes the internal parallelism of GPUs, our extended programming language provides a consistent development interface for additi ..."
Abstract
- Add to MetaCart
We present an extension to the CUDA programming language which extends parallelism to multi-GPU systems and GPU-cluster environments. Following the existing model, which exposes the internal parallelism of GPUs, our extended programming language provides a consistent development interface for additional, higher levels of parallel abstraction from the bus and network interconnects. The newly introduced layers provide the key features specific to the architecture and programmability of current graphics hardware while the underlying communication and scheduling mechanisms are completely hidden from the user. All extensions to the original programming language are handled by a self-contained compiler which is easily embedded into the CUDA compile process. We evaluate our system using two different sample applications and discuss scaling behavior and performance on different system architectures. Categories and Subject Descriptors (according to ACM CCS): [Computer-Communication Networks]: Distributed applications
GraphicsClustersIncorporatingData-Locality
"... clusters.OursystemisbasedonCUDAandlogicallyextendsitsparallelprogrammingmodelforgraphicsprocessorstohigher levelsofparallelism,namelythePCIbusandnetworkinterconnects.WhiletheextendedAPImimicsthefullfunctionsetofcurrent graphicshardware–includingtheconceptofglobalmemory–onalldistributionlayers,theund ..."
Abstract
- Add to MetaCart
clusters.OursystemisbasedonCUDAandlogicallyextendsitsparallelprogrammingmodelforgraphicsprocessorstohigher levelsofparallelism,namelythePCIbusandnetworkinterconnects.WhiletheextendedAPImimicsthefullfunctionsetofcurrent graphicshardware–includingtheconceptofglobalmemory–onalldistributionlayers,theunderlyingcommunicationmechanismsare handledtransparentlyfortheapplicationdeveloper.Toallowforhighscalability,inparticularfornetworkinterconnectedenvironments, weintroduceanautomaticGPU-acceleratedschedulingmechanismthatisawareofdatalocality.Thisway,theoverallamountof transmitteddatacanbeheavilyreduced,whichleadstobetterGPUutilizationandfasterexecution.Weevaluatetheperformanceand scalabilityofoursystemforbusandespeciallynetworklevelparallelismontypicalmulti-GPUsystemsandgraphicsclusters. IndexTerms—GPUcomputing,GraphicsClusters,ParallelProgramming

