Results 1 -
9 of
9
Simplifying the Development, Use and Sustainability of HPC Software
"... Developing software to undertake complex, compute-intensive sci-entific processes requires a challenging combination of both spe-cialist domain knowledge and software development skills to con-vert this knowledge into efficient code. As computational plat-forms become increasingly heterogeneous and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Developing software to undertake complex, compute-intensive sci-entific processes requires a challenging combination of both spe-cialist domain knowledge and software development skills to con-vert this knowledge into efficient code. As computational plat-forms become increasingly heterogeneous and newer types of plat-form such as Infrastructure-as-a-Service (IaaS) cloud computing become more widely accepted for HPC computations, scientists re-quire more support from computer scientists and resource providers to develop efficient code and make optimal use of the resources available to them. As part of the libhpc stage 1 and 2 projects we are developing a framework to provide a richer means of job spec-ification and efficient execution of complex scientific software on heterogeneous infrastructure. The use of such frameworks has im-plications for the sustainability of scientific software. In this paper we set out our developing understanding of these challenges based on work carried out in the libhpc project. 1.
OP2 Airfoil Example
, 2012
"... Airfoil, is an industrial representative CFD application benchmark, written using OP2’s C/C++ API. In this document we detail its development using OP2 and is a guide to application developers wishing to write applications using the OP2 API and framework. Full details of OP2 can be found at: ..."
Abstract
- Add to MetaCart
(Show Context)
Airfoil, is an industrial representative CFD application benchmark, written using OP2’s C/C++ API. In this document we detail its development using OP2 and is a guide to application developers wishing to write applications using the OP2 API and framework. Full details of OP2 can be found at:
unknown title
"... Building scientific software for use on HPC platforms can be a complex process bringing together the special-ist domain knowledge of the scientists who are likely to be the end-users of the resulting software, method ..."
Abstract
- Add to MetaCart
(Show Context)
Building scientific software for use on HPC platforms can be a complex process bringing together the special-ist domain knowledge of the scientists who are likely to be the end-users of the resulting software, method
Design and Initial Performance of a High-level Unstructured Mesh Framework on Heterogeneous Parallel Systems
"... OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution o ..."
Abstract
- Add to MetaCart
(Show Context)
OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2's recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using the OP2 API is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
1An Analytical Study of Loop Tiling for a Large-scale Unstructured Mesh Application
"... Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core processors is the movement of data between its different cores and main memory. As the number of cores increases, more and more data needs to be exchanged with memory to keep them fully utilized. Thi ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core processors is the movement of data between its different cores and main memory. As the number of cores increases, more and more data needs to be exchanged with memory to keep them fully utilized. This critical bottleneck is already limiting the utility of processors and our ability to leverage increased parallelism to achieve higher performance. On the other hand, considerable computer science research exists on tiling techniques (also known as sparse tiling), for reducing data transfers. Such work demonstrates how the increasing memory bottleneck could be avoided but the difficulty has been in extending these ideas to real-world applications. These algo-rithms quickly become highly complicated, and it has be very difficult to for a compiler to automatically detect the opportunities and implement the execution strategy. Focusing on the unstructured mesh application class, in this paper, we present a preliminary analytical investigation into the performance benefits of tiling (or loop-blocking) algorithms on a real-world industrial CFD application. We analytically estimate the reductions in communications or memory accesses for the main parallel loops in this application and predict quantitatively the performance benefits that can be gained on modern multi-core and many core hardware. The analysis demonstrates that in general a factor of four reduction in data movement can be achieved by tiling parallel loops. A major part of the savings come from contraction of temporary or transient data arrays that need not be written back to main memory, by holding them in the last level cache (LLC) of modern processors. 1
xx Acceleration of a Full-scale Industrial CFD Application with OP2
"... Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler st ..."
Abstract
- Add to MetaCart
(Show Context)
Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applica-tions, which feature high speed-ups when accelerated by modern processor architectures, such as multi-core and many-core processor systems, Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework. OP2 targets the domain of un-structured mesh problems and follows the design of an active library using source-to-source translation and compilation to generate multiple parallel implementations from a single high-level application source for execution on a range of back-end hardware platforms. We chart the conversion of Hydra from its original hand-tuned production version to one that utilizes OP2, and map out the key difficulties encountered in the process. To our knowledge this research presents the first application of such a high-level framework to a full scale production code. Specifically we show (1) how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application such as Hydra, and (2) how different optimizations targeting contrasting parallel architectures can be applied to the whole applica-
Performance-Portable Finite-element Computations from High-level Specifications with FFC and PyOP2
"... Abstract—We present a tool chain for the fully automated synthesis of performance-portable finite-element solvers for mul-ticore and GPGPU platforms from high-level specifications. Our runtime code generation and just-in-time compilation pathway takes finite-element forms in the domain-specific lang ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—We present a tool chain for the fully automated synthesis of performance-portable finite-element solvers for mul-ticore and GPGPU platforms from high-level specifications. Our runtime code generation and just-in-time compilation pathway takes finite-element forms in the domain-specific language UFL to low-level code. Automatically generated finite-element assembly kernels are passed to PyOP2, a domain-specific language for mesh-based simulation codes, which acts as an intermediate abstraction layer for executing the numerical kernels in parallel over an unstructured mesh. Easy integration of our tool chain allows transparently adding performance portability to existing simulation codes. PyOP2 [1], [2] is a Python implementation of the unstruc-tured mesh computation framework OP2 [3], which applies numerical kernels in parallel over an unstructured mesh. Ker-