Results 11 - 20
of
41
Program optimization in the domain of high-performance parallelism
- In this volume
, 2004
"... Abstract. I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of high-performance parallelism. Readers need not be familiar with high-performance computing. 1 ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of high-performance parallelism. Readers need not be familiar with high-performance computing. 1
A Compiler for HDC
- Fakultt fr Mathematik und Informatik
, 1999
"... We present a compiler for the functional language HDC, which aims at the generation of efficient code from high-level programs. HDC, which is syntactically a subset of the widely used language Haskell, facilitates the clean integration of skeletons with a predefined efficient parallel implementation ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present a compiler for the functional language HDC, which aims at the generation of efficient code from high-level programs. HDC, which is syntactically a subset of the widely used language Haskell, facilitates the clean integration of skeletons with a predefined efficient parallel implementation into a functional program. Skeletons are higher-order functions which represent program schemata that can be specialized by providing customizing functions as parameters. The only restriction on customizing functions is their type. Skeletons can be composed of skeletons again. With HDC, we focus on the divide-and-conquer paradigm, which has a high potential for an efficient parallelization. We describe the most important phases of the compiler: desugaring, elimination of higher-order functions, generation of an optimized directed acyclic graph and code generation, with a focus on the integration of skeletons. The effect of the transformations on the target code is demonstrated on the examp...
LinSolv: a Case Study in Strategic Parallelism
- In Glasgow Workshop on Functional Programming, Ullapool
, 1997
"... . This paper discusses the parallelisation and performance tuning of a typical computer algebra algorithm, LinSolv, using evaluation strategies. We present three steps in the parallelisation process starting with a naive parallel version. As this algorithm uses infinite data structures as interme ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
. This paper discusses the parallelisation and performance tuning of a typical computer algebra algorithm, LinSolv, using evaluation strategies. We present three steps in the parallelisation process starting with a naive parallel version. As this algorithm uses infinite data structures as intermediate values it is necessary to define very sophisticated strategies in order to improve parallel performance. We also compare the strategic parallel code with pre-strategy code. This comparison shows how evaluation strategies help to localise changes needed for parallelisation. In particular, the separation between algorithmic and parallel code makes the structure of the parallelism much clearer. 1 Introduction Tuning the performance of a parallel algorithm can be a long, tiresome process. A parallel programming model should aid the programmer especially in this stage, allowing him to experiment with different patterns of parallel behaviour. Based on our experiences with developing p...
Haskell on a SharedMemory Multiprocessor. Pages 49–61 of
- Proceedings of the ACM SIGPLAN Workshop on
, 2005
"... Categories and Subject Descriptors D.3.2 [Language Classifica-tions]: Applicative (functional) languages General Terms Languages, Performance 1. Introduction For many years the easiest approach to getting software to gofaster has been to sit around and save up for a new machine (and then preferably ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Categories and Subject Descriptors D.3.2 [Language Classifica-tions]: Applicative (functional) languages General Terms Languages, Performance 1. Introduction For many years the easiest approach to getting software to gofaster has been to sit around and save up for a new machine (and then preferably run the old software on it). It is becoming clear,however, that this free lunch is over [22]. Processor manufacturers have stopped struggling to push clock speeds much further, and areturning their attention to parallelism instead. Multi-core processors, with several symmetric processing cores on a single chip, will bethe norm in consumer machines within the next 1-2 years. The software challenge is to take advantage of this extra processingpower through parallelism.
Higher Order Demand Propagation
- Lecture Notes in Computer Science
, 1998
"... . A new denotational semantics is introduced for realistic non-strict functional languages, which have a polymorphic type system and support higher order functions and user definable algebraic data types. It maps each function definition to a demand propagator, which is a higher order function, t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. A new denotational semantics is introduced for realistic non-strict functional languages, which have a polymorphic type system and support higher order functions and user definable algebraic data types. It maps each function definition to a demand propagator, which is a higher order function, that propagates context demands to function arguments. The relation of this "higher order demand propagation semantics" to the standard semantics is explained and it is used to define a backward strictness analysis. The strictness information deduced by this analysis is very accurate, because demands can actually be constructed during the analysis. These demands conform better to the analysed functions than abstract values, which are constructed alone with respect to types like in other existing strictness analyses. The richness of the semantic domains of higher order demand propagation makes it possible to express generalised strictness information for higher order functions even ac...
Lazy Tree Splitting
"... Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.
Towards a Mobile Haskell
, 2003
"... This paper proposes a set of communication primitives for Haskell, to be used in open distributed systems, i.e. systems where multiple executing programs can interact using a prede ned protocol. Functions are \ rst class citizens" in a functional language, hence it would be natural transfer th ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper proposes a set of communication primitives for Haskell, to be used in open distributed systems, i.e. systems where multiple executing programs can interact using a prede ned protocol. Functions are \ rst class citizens" in a functional language, hence it would be natural transfer them between programs in a distributed system. However, existing distributed Haskell extensions are limited to closed systems or restrict the set of expressions that can be communicated. The former eectively prohibits large-scale distribution, whereas the latter sacri ces key abstraction constructs. A functional language that allows the communication of functions in an open system can be seen as a mobile computation language, hence we call our extension mHaskell (Mobile Haskell). We demonstrate how the proposed communication primitives can be used to implement more powerful abstractions, such as remote evaluation, and that common patterns of communication can be encoded as higher order functions or mobile skeletons. The design has been validated by constructing a prototype in Glasgow Distributed Haskell, and a compiled implementation is under construction.
Controlling Parallelism and Data Distribution in Eden
, 2000
"... : The parallel functional language Eden uses explicit processes to export computations to other processor elements and to achieve parallelism. As Eden is based on the non-strict functional language Haskell, this raises the question in which way and to which degree the lazy evaluation strategy of Has ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
: The parallel functional language Eden uses explicit processes to export computations to other processor elements and to achieve parallelism. As Eden is based on the non-strict functional language Haskell, this raises the question in which way and to which degree the lazy evaluation strategy of Haskell should be transferred to the parallel setting. A modification is needed, as a completely demand driven evaluation would not lead to real parallelism but only to distributed sequentiality. The non-existence of a global shared memory for all processes raises a second question, namely, how shared data should and can be distributed across the available processing elements. In general, one has to choose between data duplication via communication and work duplication by recomputation. In total we will discuss the interaction of laziness, parallelism and data distribution in Eden. We explain the evaluation model underlying Eden's parallel implementation and justify corresponding design decisions. 5.1
Capturing and Composing Parallel Patterns with Intel CnC
"... The most accessible and successful parallel tools today are those that ask programmers to write only isolated serial kernels, hiding parallelism behind a library interface. Examples include Google’s Map-Reduce [5], CUDA [13], and STAPL [12]. This encapsulation approach applies to a wide range of str ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The most accessible and successful parallel tools today are those that ask programmers to write only isolated serial kernels, hiding parallelism behind a library interface. Examples include Google’s Map-Reduce [5], CUDA [13], and STAPL [12]. This encapsulation approach applies to a wide range of structured, well-understood algorithms, which we call parallel patterns. Today’s highlevel systems tend to encapsulate only a single pattern. Thus we explore the use of Intel CnC as a single framework for capturing and composing multiple patterns. 1
Z.: Functional Programs on Clusters
- In: Striegnitz, Jörg; Davis, Kei (Eds.): Proceedings of the Workshop on Parallel/High-Performance Object-Oriented Scientific Computing (POOSC’03), Technical Report
, 2003
"... Abstract. The implemented Clean-CORBA and Haskell-CORBA interfaces open a way for developing parallel and distributed applications on clusters consisting of components written in functional programming languages, like Clean and Haskell. We focus on a specific application of this tool in this paper. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. The implemented Clean-CORBA and Haskell-CORBA interfaces open a way for developing parallel and distributed applications on clusters consisting of components written in functional programming languages, like Clean and Haskell. We focus on a specific application of this tool in this paper. We design and implement an abstract communication layer based on CORBA server objects. Using this layer we can build up computations in form of distributed process-networks consisting of components written in several programming languages, some components written in functional style in Clean, while other components written in an object-oriented language like Java or C++. The speed-up of computations is investigated using a simple example. 1

