• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Parallel Programming using Functional Languages (1991)

by Paul Roe
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 30
Next 10 →

A provable time and space efficient implementation of nesl

by Guy E. Blelloch, John Greiner - In International Conference on Functional Programming , 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract - Cited by 60 (7 self) - Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard call-by-value operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1

Algorithm + Strategy = Parallelism

by P.W. Trinder, K. Hammond, H.-W. Loidl, S.L. Peyton Jones - JOURNAL OF FUNCTIONAL PROGRAMMING , 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of ..."
Abstract - Cited by 51 (18 self) - Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of non-strict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand -conquer, pipeline parallelism, producer/consumer parallelism, and data-oriented parallelism. Because they are based on unrestricted higher-order functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several large-scale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.

Visualising Granularity in Parallel Programs: A Graphical Winnowing System for Haskell

by Kevin Hammond, Hans Wolfgang Loidl, Andrew Partridge - In HPFC'95 --- High Performance Functional Computing , 1995
"... To take advantage of distributed-memory parallel machines it is essential to have good control of task granularity. This paper describes a fairly accurate parallel simulator for Haskell, based on the Glasgow compiler, and complementary tools for visualising task granularities. Together these tools a ..."
Abstract - Cited by 21 (9 self) - Add to MetaCart
To take advantage of distributed-memory parallel machines it is essential to have good control of task granularity. This paper describes a fairly accurate parallel simulator for Haskell, based on the Glasgow compiler, and complementary tools for visualising task granularities. Together these tools allow us to study the effects of various annotations on task granularity on a variety of simulated parallel architectures. They also provide a more precise tool for the study of parallel execution than has previously been available for Haskell programs. These tools have already confirmed that thread migration is essential in parallel systems, demonstrated a close correlation between thread execution times and total heap allocations, and shown that fetching data synchronously normally gives better overall performance than asynchronous fetching, if data is fetched on demand. 1 Introduction Our aim is to produce fast, cost-effective implementations of lazy functional languages. One way to impro...

Parallelization via Context Preservation

by W N Chin, A Takano, Z Hu, Wei-ngan Chin, Akihiko Takano, Zhenjiang Hu - In IEEE Intl Conference on Computer Languages , 1998
"... Abstract program schemes, such as scan or homomorphism, can capture a wide range of data parallel programs. While versatile, these schemes are of limited practical use on their own. A key problem is that the more natural sequential specifications may not have associative combine operators required b ..."
Abstract - Cited by 17 (16 self) - Add to MetaCart
Abstract program schemes, such as scan or homomorphism, can capture a wide range of data parallel programs. While versatile, these schemes are of limited practical use on their own. A key problem is that the more natural sequential specifications may not have associative combine operators required by these schemes. As a result, they often fail to be immediately identified. To resolve this problem, we propose a method to systematically derive parallel programs from sequential definitions. This method is special in that it can automatically invent auxiliary functions needed by associative combine operators. Apart from a formalisation, we also provide new theorems, based on the notion of context preservation, to guarantee parallelization for a precise class of sequential programs. 1 Introduction It is well-recognised that a key problem of parallel computing remains the development of efficient and correct parallel software. This task is further complicated by the variety of parallel arc...

On the Runtime Complexity of Type-Directed Unboxing

by Yasuhiko Minamide, Jacques Garrigue - In Proceedings of the Third ACM SIGPLAN International Conference on Functional programming , 1998
"... Avoiding boxing when representing native objects is essential for the efficient compilation of any programming language. For polymorphic languages this task is difficult, but several schemes have been proposed that remove boxing on the basis of type information. Leroy's type-directed unboxing transf ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
Avoiding boxing when representing native objects is essential for the efficient compilation of any programming language. For polymorphic languages this task is difficult, but several schemes have been proposed that remove boxing on the basis of type information. Leroy's type-directed unboxing transformation is one of them. One of its nicest properties is that it relies only on visible types, which makes it compatible with separate compilation. However it has been noticed that it is not safe both in terms of time and space complexity |i.e. transforming a program may raise its complexity. We propose a refinement of this transformation, still relying only on visible types, and prove that it satis es the safety condition for time complexity. The proof is an extension of the usual logical relation method, in which correctness and safety are proved simultaneously. 1 Introduction Compared to explicitly typed first order traditional languages, polymorphically typed functional programming languages...

A Provably Time-Efficient Parallel Implementation of Full Speculation

by John Greiner, Guy E. Blelloch - In Proceedings of the 23rd ACM Symposium on Principles of Programming Languages , 1996
"... Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation of a speculative functional language on various machine models. The implementation includes proper parallelization of the necessary queuing operations on suspended threads. Our target machine models are a butterfly network, hypercube, and PRAM. To prove the efficiency of our implementation, we provide a cost model using a profiling semantics and relate the cost model to implementations on the parallel machine models. 1 Introduction Futures, lenient languages, and several implementations of graph reduction for lazy languages all use speculative evaluation (call-by-speculation [15]) to expose parallelism. The basic idea of speculative evaluation, in this context, is that the evaluation of a...

Feedback Directed Implicit Parallelism

by Tim Harris, Satnam Singh
"... In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the pr ..."
Abstract - Cited by 15 (0 self) - Add to MetaCart
In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the program with this work being performed speculatively via a work-stealing system and then (iv) to detect at run-time any attempt to perform operations that would reveal the presence of speculation. We assess the practicality of the approach through an implementation based on GHC 6.6 along with a limit study based on the execution profiles we gathered. We support the full Concurrent Haskell language compiled with traditional optimizations and including I/O operations and synchronization as well as pure computation. We use 20 of the larger programs from the ‘nofib ’ benchmark suite. The limit study shows that programs vary a lot in the parallelism we can identify: some have none, 16 have a potential 2x speed-up, 4 have 32x. In practice, on a 4-core processor, we get 10-80 % speed-ups on 7 programs. This is mainly achieved at the addition of a second core rather than beyond this. This approach is therefore not a replacement for manual parallelization, but rather a way of squeezing extra performance out of the threads of an already-parallel program or out of a program that has not yet been parallelized.

Realtime Signal Processing -- Dataflow, Visual, and Functional Programming

by Hideki John Reekie , 1995
"... This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computat ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computation. The dataflow process network model, although widely-used, lacks a formal description, and I provide a semantics for it. The formal work leads into a new form of actor. Having established the semantics of dataflow processes, the functional language Haskell is layered above this model, providing powerful features---notably polymorphism, higher-order functions, and algebraic program transformation---absent in block-diagram systems. A visual equivalent notation for Haskell, Visual Haskell, ensures that this power does not exclude the "intuitive" appeal of visual interfaces; with some intelligent layout and suggestive icons, a Visual Haskell program can be made to look very like a block dia...

A Combinational Framework For Parallel Programming Using Algorithmic Skeletons

by Mohammad M. Hamdan , 2000
"... ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Abstract not found

Profiling Parallel Functional Computations (Without Parallel Machines)

by Colin Runciman, David Wakeling - Glasgow Workshop on Functional Programming , 1993
"... This paper describes the design and use of a new tool for profiling the parallelism present in annotated functional programs. One component of the tool is a compiler modified to produce programs that run in quasi-parallel on an ordinary workstation. A quasi-parallel implementation has several advant ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
This paper describes the design and use of a new tool for profiling the parallelism present in annotated functional programs. One component of the tool is a compiler modified to produce programs that run in quasi-parallel on an ordinary workstation. A quasi-parallel implementation has several advantages for profiling parallelism. Firstly, it allows the programmer to concentrate solely on the details of parallel program design by abstracting away from the details of parallel machine design. No attempt is made to simulate the organisation or workings of any particular parallel 2 computer. Secondly, a parallel program's behaviour is reproducible at different times and on different computers --- a very important property when investigating the causes of poor performance. Finally, it is comparatively easy to experiment with different parallel primitives and evaluation strategies, and to generate different kinds of profiling information. The other component of the tool aids understanding of profiling information by converting it to various different graphical forms. We are specifically interested in lazy parallel functional programs. It can be hard to write an efficient lazy sequential program, and it is at least as hard to write a lazy parallel program as it is to write a lazy sequential one. So, by using a quasiparallel implementation and ignoring the extra constraints imposed by a parallel computer, we are by no means avoiding "all the hard problems". 2 Profiling Parallelism
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University