Results 1 - 10
of
19
Nested Algorithmic Skeletons from Higher Order Functions
, 2000
"... Algorithmic skeletons provide a promising basis for the automatic utilisation of parallelism at sites of higher-order function use through static program analysis. However, decisions about whether or not to realise particular higher-order function instances as skeletons must be based on informati ..."
Abstract
-
Cited by 25 (12 self)
- Add to MetaCart
Algorithmic skeletons provide a promising basis for the automatic utilisation of parallelism at sites of higher-order function use through static program analysis. However, decisions about whether or not to realise particular higher-order function instances as skeletons must be based on information about available processing resources, and such resources may change subsequent to program analysis. In principle, nested higher-order functions may be realised as nested skeletons. However, where higher-order function arguments result from partially applied functions, free-variable bindings must be identified and communicated through the corresponding skeleton hierarchy to where those arguments are actually applied. Here, a skeleton based parallelising compiler from Standard ML to native code is presented. Hybrid skeletons, which can change from parallel to serial evaluation at run-time, are considered and mechanisms for their nesting are discussed. Compilation stages are illustra...
The Functional Imperative: Shape!
- 7th European Symposium on Programming, ESOP'98 Held as part of the joint european conferences on theory and practice of software, ETAPS'98
, 1997
"... Introduction FiSh is a new programming language for array computation that compiles higher-order polymorphic programs into simple imperative programs expressed in a sub-language Turbot, which can then be translated into, say, C. Initial tests show that the resulting code is extremely fast: two orde ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
Introduction FiSh is a new programming language for array computation that compiles higher-order polymorphic programs into simple imperative programs expressed in a sub-language Turbot, which can then be translated into, say, C. Initial tests show that the resulting code is extremely fast: two orders of magnitude faster than Haskell, and two to four times faster than Objective Caml, one of the fastest ML variants for array programming. Every functional program must ultimately be converted into imperative code, but the mechanism for this is often hidden. FiSh achieves this transparently, using the "equation" from which it is named: Functional = Imperative + Shape Shape here refers to the structure of data, e.g. the length of a vector, or the number of rows and columns of a matrix. The FiSh compiler reads the equation from left to right: it converts functions into procedures by using
Towards Parallel Programming by Transformation: The FAN Skeleton Framework
, 2001
"... A Functional Abstract Notation (FAN) is proposed for the specification and design of parallel algorithms by means of skeletons - high-level patterns with parallel semantics. The main weakness of the current programming systems based on skeletons is that the user is still responsible for finding the ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
A Functional Abstract Notation (FAN) is proposed for the specification and design of parallel algorithms by means of skeletons - high-level patterns with parallel semantics. The main weakness of the current programming systems based on skeletons is that the user is still responsible for finding the most appropriate skeleton composition for a given application and a given parallel architecture. We describe a transformational framework for the development of skeletal programs which is aimed at filling this gap. The framework makes use of transformation rules which are semantic equivalences among skeleton compositions. For a given problem, an initial, possibly inefficient skeleton specification is refined by applying a sequence of transformations. Transformations are guided by a set of performance prediction models which forecast the behavior of each skeleton and the performance benefits of different rules. The design process is supported by a graphical tool which locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space. We give an overview of the FAN framework and exemplify its use with performance-directed program derivations for simple case studies. Our experience can be viewed as a first feasibility study of methods and tools for transformational, performance-directed parallel programming using skeletons.
Stream Parallel Skeleton Optimization
- in proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems, MIT
, 1999
"... We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleto ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is dened as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form. Keywords: skeletons, rewriting, stream parallelism. 1 Introduction Skeleton based programming models represent an interesting subject in the eld of parallel programming models. Cole introduced the skeleton concept in the late 80's [1]. Cole's skeletons represented parallelism exploitation patterns that can be used (instantiated) to model common parallel ap...
Space Profiling for Parallel Functional Programs
"... This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provi ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies yet abstracts away from many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.
Tuning Task Granularity and Data Locality of Data Parallel GpH Programs
, 2001
"... The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead | task granularity ; and the communication costs induced by the distance between tasks and their data | data locality . In da ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead | task granularity ; and the communication costs induced by the distance between tasks and their data | data locality . In data parallel programs both granularity and locality can be improved by clustering, i.e. arranging for parallel tasks to operate on related sub-collections of data.
Costing Parallel Programs as a Function of Shapes
- Science of Computer Programming
, 1999
"... Portable, efficient, parallel programming requires cost models to compare different possible implementations. In turn, these require knowledge of the shapes of the data structures being used, as well as knowledge of the hardware parameters. This paper shows how shape analysis techniques developed ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Portable, efficient, parallel programming requires cost models to compare different possible implementations. In turn, these require knowledge of the shapes of the data structures being used, as well as knowledge of the hardware parameters. This paper shows how shape analysis techniques developed in the FISh programming language could be exploited to produce a data parallel language with an accurate, portable cost model. 1 Introduction The problem of constructing portable efficient parallel programs is still unsolved. It originates in the observation that an algorithm that executes efficiently in one setting may be extremely inefficient in another. Hence, the challenge is to automatically adapt the algorithm to match the circumstances. To do this during compilation requires a cost model that is able to identify which of two alternative algorithms is faster. To date, most work has focussed on measuring the impact of changes to hardware as observed through a small suite of hardwar...
Compilation of a Specialized Functional Language for Massively Parallel Computers
- Journal of Functional Programming
, 2000
"... We propose a parallel specialized language that ensures portable and cost-predictable implementations on parallel computers. The language is basically a first-order, recursion-less, strict functional language equipped with a collection of higher-order functions or skeletons. These skeletons apply on ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose a parallel specialized language that ensures portable and cost-predictable implementations on parallel computers. The language is basically a first-order, recursion-less, strict functional language equipped with a collection of higher-order functions or skeletons. These skeletons apply on (nested) vectors and can be grouped in four classes: computation, reorganization, communication, and mask skeletons. The compilation process is described as a series of transformations and analyses leading to spmd-like functional programs which can be directly translated into real parallel code. The language restrictions enforce a programming discipline whose benefit is to allow a static, symbolic, and accurate cost analysis. The parallel cost takes into account both load balancing and communications, and can be statically evaluated even when the actual size of vectors or the number of processors are unknown. It is used to automatically select the best data distribution among a set of standard distributions. Interestingly, this work can be seen as a cross fertilization between techniques developed within the Fortran parallelization, skeleton, and functional programming communities.
Polymorphism Over Nested Regular Arrays: Theory and Implementation in
, 1998
"... this paper we introduce a wider range of benchmarks, 4 ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
this paper we introduce a wider range of benchmarks, 4

