Results 1 -
5 of
5
High-level Abstractions for Performance, Portability and Continuity of Scientific Software on Future Computing Systems
, 2014
"... In this report we present research on applying a domain specific high-level abstractions development strategy with the aim to “future-proof “ a key class of high performance computing (HPC) applica-tions that simulate hydro-dynamics computations at AWE plc. We build on an existing high-level abstrac ..."
Abstract
- Add to MetaCart
(Show Context)
In this report we present research on applying a domain specific high-level abstractions development strategy with the aim to “future-proof “ a key class of high performance computing (HPC) applica-tions that simulate hydro-dynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block structured mesh-based applications at the University of Oxford. The target application, is an unclassified benchmark application, CloverLeaf, that consists of algorithms of interest from the hydro-dynamics workload at AWE plc. OPS uses an “active library ” approach where a single application code written using the OPS API can be transformed into different parallel implementations which can then be linked against the appropriate parallel library enabling execution on different back-end hardware platforms. At the same time the generated code and the platform specific back-end libraries are highly optimized utilizing the best low-level features of a target architecture to make an OPS application achieve near-optimal performance including high computational efficiency and minimized memory traffic. We present (1) the conversion of CloverLeaf to utilize OPS, (2) the utilization of OPS’s code
The OPS Domain Specific Abstraction for Multi-Block Structured Grid Computations
"... Abstract—Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high-level programming approach, and then subsequently using domain knowledge to deliver high performance on various hardware. In this paper, we introduce the OPS high-level abstraction and active library aimed at multi-block structured grid computations, and discuss some of its key design points; we demonstrate how OPS can be embedded in C/C++ and the API made to look like a traditional library, and how through a combination of simple text manipulation and back-end logic we can enable execution on a diverse range of hardware using different parallel programming approaches. Relying on the access-execute description of the OPS abstrac-tion, we introduce a number of automated execution techniques that enable distributed memory parallelization, optimization of communication patterns, checkpointing and cache-blocking. Using performance results from CloverLeaf from the Mantevo suite of benchmarks, we demonstrate the utility of OPS.
Design and Initial Performance of a High-level Unstructured Mesh Framework on Heterogeneous Parallel Systems
"... OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution o ..."
Abstract
- Add to MetaCart
(Show Context)
OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2's recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using the OP2 API is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
Generalizing Run-time Tiling with the Loop Chain Abstraction
"... Abstract—Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependen ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependence structure is data-dependent due to indirect references such as A[B[i]]. Such references are a common occurrence in sparse matrix computations, molecu-lar dynamics simulations, and unstructured-mesh computational fluid dynamics (CFD). Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality. These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the sparse matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application. In this paper, we present a generalized full sparse tiling algorithm that uses the newly developed loop chain abstraction as input, improves inter-loop data locality, and creates a task graph to expose shared-memory parallelism at runtime. We evaluate the overhead and performance impact of the generalized full sparse tiling algorithm on two codes: a sparse Jacobi iterative solver and the Airfoil CFD benchmark. Keywords-inspector/executor, run-time reordering transforma-tions, tiling I.
Compilers for Regular and Irregular Stencils: Some Shared Problems and Solutions
"... Abstract—Solving partial differential equations results in a continuum of regular and irregular stencil computation implementations. In this paper, we use heat diffusion on a bar to show how regular and irregular stencil computations are related, and then illustrate five complicating issues that occ ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Solving partial differential equations results in a continuum of regular and irregular stencil computation implementations. In this paper, we use heat diffusion on a bar to show how regular and irregular stencil computations are related, and then illustrate five complicating issues that occur in implementing the continuum of regular and irregular stencil computations in full applications. These complicating issues make it difficult for compilers to discover stencil computations and to represent and generate code for the combination of regular and irregular transformations that are relevant. We overview projects at Colorado State University and elsewhere that are developing solutions to the complicating issues surrounding stencil computations in partial differential equation (PDE) solver applications. Keywords-regular and irregular stencils, sparse matrix vector multiplication (SpMV), sparse tiling, sparse polyhedral framework, heat diffusion problem, PDE solvers I.