• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Op2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures (0)

by G R Mudalige, M B Giles, I Reguly, C Bertolli, P H J Kelly
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

Simplifying the Development, Use and Sustainability of HPC Software

by Jeremy Cohen, Chris Cantwell, Neil Chue Hong, David Moxey, Malcolm Illingworth, Andrew Turner, John Darlington, Spencer Sherwin
"... Developing software to undertake complex, compute-intensive sci-entific processes requires a challenging combination of both spe-cialist domain knowledge and software development skills to con-vert this knowledge into efficient code. As computational plat-forms become increasingly heterogeneous and ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Developing software to undertake complex, compute-intensive sci-entific processes requires a challenging combination of both spe-cialist domain knowledge and software development skills to con-vert this knowledge into efficient code. As computational plat-forms become increasingly heterogeneous and newer types of plat-form such as Infrastructure-as-a-Service (IaaS) cloud computing become more widely accepted for HPC computations, scientists re-quire more support from computer scientists and resource providers to develop efficient code and make optimal use of the resources available to them. As part of the libhpc stage 1 and 2 projects we are developing a framework to provide a richer means of job spec-ification and efficient execution of complex scientific software on heterogeneous infrastructure. The use of such frameworks has im-plications for the sustainability of scientific software. In this paper we set out our developing understanding of these challenges based on work carried out in the libhpc project. 1.
(Show Context)

Citation Context

...er directives to specify code that should be executed on alternative hardware, have emerged to provide a common approach to developing code that can be executed on different platforms. Similarly, OP2 =-=[13]-=- is a framework for running applications on clusters of GPUs or multi-core CPUs. While these language extensions and frameworks provide methods for low-level optimisation, they require skilled develop...

OP2 Airfoil Example

by Mike Giles, Gihan Mudalige, István Reguly , 2012
"... Airfoil, is an industrial representative CFD application benchmark, written using OP2’s C/C++ API. In this document we detail its development using OP2 and is a guide to application developers wishing to write applications using the OP2 API and framework. Full details of OP2 can be found at: ..."
Abstract - Add to MetaCart
Airfoil, is an industrial representative CFD application benchmark, written using OP2’s C/C++ API. In this document we detail its development using OP2 and is a guide to application developers wishing to write applications using the OP2 API and framework. Full details of OP2 can be found at:
(Show Context)

Citation Context

...total execution of the application and performs about 100 floating-point operations per mesh edge. Extensive performance analysis of Airfoil and optimisations have been detailed in our published work =-=[3, 4, 5, 6]-=-. What follows is a step-by-step treatment of the stages involved in developing Airfoil. The application and generated code can be found under OP2-Common/apps/c/airfoil. 2 The Single CPU Version For t...

unknown title

by unknown authors
"... Building scientific software for use on HPC platforms can be a complex process bringing together the special-ist domain knowledge of the scientists who are likely to be the end-users of the resulting software, method ..."
Abstract - Add to MetaCart
Building scientific software for use on HPC platforms can be a complex process bringing together the special-ist domain knowledge of the scientists who are likely to be the end-users of the resulting software, method
(Show Context)

Citation Context

...er directives to specify codesthat should be executed on alternative hardware, havesemerged to provide a common approach to developing code that can be executed on different platforms.sSimilarly, OP2 =-=[12]-=- is a framework for running unstructured grid applications on multiple cores of either GPUssor CPUs. Compile-time auto-tuning and runtime optimisation offer the potential of supporting a much widersra...

Design and Initial Performance of a High-level Unstructured Mesh Framework on Heterogeneous Parallel Systems

by G. R. Mudaligea, M. B. Gilesa, J. Thiyagalingama, I. Regulya, C. Bertollib, P. H. J. Kellyc, A. E. Trefethena
"... OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution o ..."
Abstract - Add to MetaCart
OP2 is a high-level domain specic library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2's recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using the OP2 API is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
(Show Context)

Citation Context

... order to gain near-optimal performance. This paper documents a number of signicant developments in the design of OP2's heterogeneous back-ends and 3 their performance extending our previous work in =-=[36]-=-: (1) A major contribution is the development of OP2's MPI+OpenMP back-end design and performance which augments the MPI only and MPI+CUDA implementations. This new back-end provides key insights into...

1An Analytical Study of Loop Tiling for a Large-scale Unstructured Mesh Application

by unknown authors
"... Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core processors is the movement of data between its different cores and main memory. As the number of cores increases, more and more data needs to be exchanged with memory to keep them fully utilized. Thi ..."
Abstract - Add to MetaCart
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core processors is the movement of data between its different cores and main memory. As the number of cores increases, more and more data needs to be exchanged with memory to keep them fully utilized. This critical bottleneck is already limiting the utility of processors and our ability to leverage increased parallelism to achieve higher performance. On the other hand, considerable computer science research exists on tiling techniques (also known as sparse tiling), for reducing data transfers. Such work demonstrates how the increasing memory bottleneck could be avoided but the difficulty has been in extending these ideas to real-world applications. These algo-rithms quickly become highly complicated, and it has be very difficult to for a compiler to automatically detect the opportunities and implement the execution strategy. Focusing on the unstructured mesh application class, in this paper, we present a preliminary analytical investigation into the performance benefits of tiling (or loop-blocking) algorithms on a real-world industrial CFD application. We analytically estimate the reductions in communications or memory accesses for the main parallel loops in this application and predict quantitatively the performance benefits that can be gained on modern multi-core and many core hardware. The analysis demonstrates that in general a factor of four reduction in data movement can be achieved by tiling parallel loops. A major part of the savings come from contraction of temporary or transient data arrays that need not be written back to main memory, by holding them in the last level cache (LLC) of modern processors. 1
(Show Context)

Citation Context

...the benefits of optimizing communications or data accesses. Hydra is implemented based on the OPlus [12], [13] abstraction layer and is currently being converted to use its successor, OP2 [10], [14], =-=[15]-=-. Both OPlus and OP2 aim to decouple the specification of an unstructured mesh problem from its parallel implementation. OPlus provided an abstraction layer for parallelizing an application on distrib...

Edinburgh

by Core Architectures, M. Sergio Campobasso
"... ar ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...r improve computational efficiency, for a given simulation. An increasing number of research programmes aimed at developing efficient hybrid parallelisation technologies are under way. The OP2 library=-=[13]-=- provides users with functionality to implement CFD applications using unstructured meshes on a range of different computational hardware, including multi-core and many-core (particularly GPGPU) syste...

xx Acceleration of a Full-scale Industrial CFD Application with OP2

by István Z. Reguly
"... Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler st ..."
Abstract - Add to MetaCart
Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applica-tions, which feature high speed-ups when accelerated by modern processor architectures, such as multi-core and many-core processor systems, Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework. OP2 targets the domain of un-structured mesh problems and follows the design of an active library using source-to-source translation and compilation to generate multiple parallel implementations from a single high-level application source for execution on a range of back-end hardware platforms. We chart the conversion of Hydra from its original hand-tuned production version to one that utilizes OP2, and map out the key difficulties encountered in the process. To our knowledge this research presents the first application of such a high-level framework to a full scale production code. Specifically we show (1) how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application such as Hydra, and (2) how different optimizations targeting contrasting parallel architectures can be applied to the whole applica-
(Show Context)

Citation Context

... Vol. xx, No. xx, Article xx, Publication date: October 2013. xx:4 I.Z. Reguly et al. OP2’s design and development [Giles et al. 2012; Giles et al. 2013] and its performance on heterogeneous systems [=-=Mudalige et al. 2012-=-]. These works investigated the performance through a standard unstructured mesh finite volume computational fluid dynamics (CFD) benchmark, called “Airfoil”, written in C using the OP2 API and parall...

3 Simplifying the Development, Use and Sustainability of HPC Software

by Jeremy Cohen, Chris Cantwell, Neil Chue Hong, David Moxey, Malcolm Illingworth, Andrew Turner, John Darlington, Spencer Sherwin
"... ar ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...er directives to specify code that should be executed on alternative hardware, have emerged to provide a common approach to developing code that can be executed on different platforms. Similarly, OP2 =-=[13]-=- is a framework for running applications on clusters of GPUs or multi-core CPUs. While these language extensions and frameworks provide methods for low-level optimisation, they require skilled develop...

Performance-Portable Finite-element Computations from High-level Specifications with FFC and PyOP2

by Paul H. J. Kelly
"... Abstract—We present a tool chain for the fully automated synthesis of performance-portable finite-element solvers for mul-ticore and GPGPU platforms from high-level specifications. Our runtime code generation and just-in-time compilation pathway takes finite-element forms in the domain-specific lang ..."
Abstract - Add to MetaCart
Abstract—We present a tool chain for the fully automated synthesis of performance-portable finite-element solvers for mul-ticore and GPGPU platforms from high-level specifications. Our runtime code generation and just-in-time compilation pathway takes finite-element forms in the domain-specific language UFL to low-level code. Automatically generated finite-element assembly kernels are passed to PyOP2, a domain-specific language for mesh-based simulation codes, which acts as an intermediate abstraction layer for executing the numerical kernels in parallel over an unstructured mesh. Easy integration of our tool chain allows transparently adding performance portability to existing simulation codes. PyOP2 [1], [2] is a Python implementation of the unstruc-tured mesh computation framework OP2 [3], which applies numerical kernels in parallel over an unstructured mesh. Ker-
(Show Context)

Citation Context

...ration of our tool chain allows transparently adding performance portability to existing simulation codes. PyOP2 [1], [2] is a Python implementation of the unstructured mesh computation framework OP2 =-=[3]-=-, which applies numerical kernels in parallel over an unstructured mesh. Kernels and parallel loop invocation code are just-in-time (JIT) compiled at runtime and cached. Subsequent parallel loops usin...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University