Results 1 - 10
of
15
Optimising Data-Parallel Programs Using the BSP Cost Model
- In Europar'98
, 1998
"... . We describe the use of the BSP cost model to optimise programs using skeletons or data-parallel operations, in which program components may have multiple implementations. The use of BSP transforms the problem of finding the best implementation choice for each component that minimises overall execu ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
. We describe the use of the BSP cost model to optimise programs using skeletons or data-parallel operations, in which program components may have multiple implementations. The use of BSP transforms the problem of finding the best implementation choice for each component that minimises overall execution time into a one-dimensional minimisation problem. An algorithm which finds optimal implementations in time linear in the length of the program is given. 1 Problem Setting Many parallel programming models gain expressiveness by raising the level of abstraction. Important examples are skeletons and data-parallel languages such as HPF. Programs in these models are compositions of moderately-large building blocks, each of which hides significant parallel computation internally. There are typically multiple implementations for each of these building blocks, and it is straightforward to order these implementations by execution cost. What makes the problem difficult is that different implemen...
Compiling and supporting skeletons on MPP
- In Proceedings of Massively Parallel Programming Models (MPPM) '97
, 1997
"... Parallel programming needs a high level programming model in which compilers and run time supports take care of traditionally intractable problems related to efficient usage of the target machine (mapping, scheduling, data decomposition, etc.). The matter of designing a real system providing such a ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Parallel programming needs a high level programming model in which compilers and run time supports take care of traditionally intractable problems related to efficient usage of the target machine (mapping, scheduling, data decomposition, etc.). The matter of designing a real system providing such a model is highly simplified by constructing the parallel programs using scalable skeletons which capture common structural components of parallel computations. The key problem is the efficient implementation of programs composed of several nested skeleton instances. This requires optimizing the resulting process graph structure and map it on the available resources in order to balance load and minimize communications. The paper describes how this can be done, despite of the intractability of the problems involved, exploiting the `structure' imposed by the skeleton approach. 1. Introduction Skeleton based parallel models are becoming increasingly popular as the basis of high level systems ...
Compile-time Cost Analysis for Parallel Programming
- In Proceedings of EUROPAR96
, 1996
"... . This paper focuses on the compile-time cost analysis of programs expressed in the BMF-style, which results in the selection of a cost-effective parallel implementation on a given topology. 1 Introduction Developing efficient software for parallel computers is a difficult task, even for the specia ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
. This paper focuses on the compile-time cost analysis of programs expressed in the BMF-style, which results in the selection of a cost-effective parallel implementation on a given topology. 1 Introduction Developing efficient software for parallel computers is a difficult task, even for the specialist. An ideal parallel programming model should provide architecture-independence, a high level of abstraction, and accurate performance estimates. The HOPP (Higher-order Parallel Programming) model is based on the Bird-Meertens Formalism (BMF) [Bir89]. BMF comprises of a set of useful higher-order functions, many of which are implicitly-parallel, and which enables programs to be expressed at a high level of abstraction. HOPP is based on a functional paradigm, and therefore, automatically inherits architecture-independence. The behaviour of the BMF functions is predetermined and this feature is exploited in building a cost model that aims to accurately predict the costs of programs. HOPP mo...
CostDriven Autonomous Mobility
, 2007
"... Autonomous mobile programs (AMPs) offer a novel decentralised load management technology where periodic use is made of cost models to decide where to execute in a network. In this paper we demonstrate how sequential programs can be automatically converted into AMPs. The AMPs are generated by an auto ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Autonomous mobile programs (AMPs) offer a novel decentralised load management technology where periodic use is made of cost models to decide where to execute in a network. In this paper we demonstrate how sequential programs can be automatically converted into AMPs. The AMPs are generated by an automatic continuation cost analyser that replaces iterations with Costed Autonomous Mobility Skeletons (CAMS), that encapsulate autonomous mobility. The CAMS cost model uses an entirely novel continuation cost semantics to predict both the cost of the current iteration and the continuation cost of the remainder of the program. We show that CAMS convey significant performance advantages, e.g. reducing execution time by up to 53%; that the continuation cost models are consistent with the existing AMP cost models; and that the overheads of collecting and utilising the continuation costs are relatively small. We discuss example AMPs generated by the analyser and demonstrate that they have very similar performance to hand-costed CAMS programs.
Compilation of a Specialized Functional Language for Massively Parallel Computers
- Journal of Functional Programming
, 2000
"... We propose a parallel specialized language that ensures portable and cost-predictable implementations on parallel computers. The language is basically a first-order, recursion-less, strict functional language equipped with a collection of higher-order functions or skeletons. These skeletons apply on ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose a parallel specialized language that ensures portable and cost-predictable implementations on parallel computers. The language is basically a first-order, recursion-less, strict functional language equipped with a collection of higher-order functions or skeletons. These skeletons apply on (nested) vectors and can be grouped in four classes: computation, reorganization, communication, and mask skeletons. The compilation process is described as a series of transformations and analyses leading to spmd-like functional programs which can be directly translated into real parallel code. The language restrictions enforce a programming discipline whose benefit is to allow a static, symbolic, and accurate cost analysis. The parallel cost takes into account both load balancing and communications, and can be statically evaluated even when the actual size of vectors or the number of processors are unknown. It is used to automatically select the best data distribution among a set of standard distributions. Interestingly, this work can be seen as a cross fertilization between techniques developed within the Fortran parallelization, skeleton, and functional programming communities.
A Scheme For Nesting Algorithmic Skeletons
- Department of Computer Science, University College London
, 1998
"... . A scheme for arbitrary nesting of algorithmic skeletons is explained which is based on the idea of groups in MPI. Two skeletons were developed which run in a nested mode: a binary divide and conquer and a process farm for a parallel implementation of fold and map HOFs respectively. An Example show ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. A scheme for arbitrary nesting of algorithmic skeletons is explained which is based on the idea of groups in MPI. Two skeletons were developed which run in a nested mode: a binary divide and conquer and a process farm for a parallel implementation of fold and map HOFs respectively. An Example showing various cases for nesting the two skeletons is presented. The experiment was conducted on the Fujitsu AP1000 parallel machine. 1 Introduction It is well known that parallelism adds an additional level of difficulty to software development. Following Cole's characterisation[1], algorithmic skeletons have been recognised widely as a valuable basis for parallel software construction. A skeleton abstracts a control structure which may be instantiated subsequently with specific functions to carry out specific tasks. Therefore, the encapsulation of parallel algorithms into skeletons is a promising approach to high-level specification of parallel algorithms. Normally, functional programming la...
Skeleton Implementations Based on Generic Data Distributions
- 2nd Intern. Workshop on Constructive Methods for Parallel Programming (CMPP'2000
, 2000
"... Data distribution algebras are an abstract notion for the description of parallel programs. In this paper we describe a generic implementation of data distributions (covers) which generalizes operations common to all data distributions and is parameterized by a description of the properties of the s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Data distribution algebras are an abstract notion for the description of parallel programs. In this paper we describe a generic implementation of data distributions (covers) which generalizes operations common to all data distributions and is parameterized by a description of the properties of the specific cover. Especially the communication and synchronization operations are realized in a generic way, such that every specific cover can specify the overlapping data in its implementation in a purely functional manner and rely on the general operations which internally generate the necessary communication schedule. This serves as the basis for arbitrary data distributions as well as for the implementation of general skeletons. As an example, we provide a skeletal implementation of the Wang algorithm.
Parallel Standard ML with Skeletons
- More or Less Explicit Parallelism in Concurrent Clean. Workshop on Parallel Functional Programming in association with IFL’98
, 1998
"... Abstract. We present an overview of our system for automatically extracting parallelism from Standard ML programs using algorithmic skeletons. This system identifies a small number of higher-order functions as sites of parallelism and the compiler uses profiling and transformation techniques to expl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We present an overview of our system for automatically extracting parallelism from Standard ML programs using algorithmic skeletons. This system identifies a small number of higher-order functions as sites of parallelism and the compiler uses profiling and transformation techniques to exploit these. Key words. automated parallelization, higher-order functions, algorithmic skeletons. 1. Introduction. The
A Survey of Cost Models for Algorithmic Skeletons
, 1999
"... This report presents a survey of performance models for parallel algorithmic skeletons. First, higher-order functions (HOFs) are presented according to the modelled skeletons. Next, the corresponding parallel implementations (skeletons) for the HOFs are discussed with the performance models that wer ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This report presents a survey of performance models for parallel algorithmic skeletons. First, higher-order functions (HOFs) are presented according to the modelled skeletons. Next, the corresponding parallel implementations (skeletons) for the HOFs are discussed with the performance models that were constructed for the skeletons. 1 Introduction Effective portability depends crucially on the predictability of performance. Therefore, accurate performance models are required to predict the behaviour of a given skeleton before successfully porting the application that is currently using this skeleton. For the importance of performance models for algorithmic skeletons, this report aims to survey some of them. 2 Higher Order Functions for the Modeled Skeletons This section presents briefly the HOFs used to describe the skeletons in Section 3. The HOFs are classified into two major classes: general and application specific HOFs. 2.1 General Higher Order Functions The HOFs are presented i...

