Results 1 -
7 of
7
Accelerating code on multi-cores with FastFlow
- In Proc. of 17th Intl. Euro-Par ’11 Parallel Processing, LNCS
, 2011
"... FastFlow is a programming framework specifically targeting cache-coherent shared-memory multicores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In t ..."
Abstract
-
Cited by 21 (15 self)
- Add to MetaCart
(Show Context)
FastFlow is a programming framework specifically targeting cache-coherent shared-memory multicores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications. keywords: offload, patterns, multi-core, lock-free synchronization, C++. 1
An abstract annotation model for skeletons
"... Abstract. Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code. 1
1Parallel Visual Data Restoration on Multi-GPGPUs using Stencil-Reduce Pattern
"... In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented.
Parallel video denoising on heterogeneous platforms
"... In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.
Task number: T8.1/2/3
"... Editor and editor’s address: Todd Wilde Project co-funded by the European Commission within the Seventh Framework Programme PU Public Dissemination Level PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including ..."
Abstract
- Add to MetaCart
(Show Context)
Editor and editor’s address: Todd Wilde Project co-funded by the European Commission within the Seventh Framework Programme PU Public Dissemination Level PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services) Executive Summary This document describes overall plans for the use and dissimination of foreground knowledge in the ParaPhrase project. It describes the dissemination activities that have taken place at each of the consortium partners in the first year of the projects, outlines the general plans for the use and dissmenination of knowledge at each
The Loop-of-Stencil-Reduce paradigm
"... Abstract—In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both. Keywords—skeletons, fastflow, parallel patterns, multi-core, OpenCL, GPUs, heterogeneous platforms