Results 1 -
7 of
7
Visualising Granularity in Parallel Programs: A Graphical Winnowing System for Haskell
- In HPFC'95 --- High Performance Functional Computing
, 1995
"... To take advantage of distributed-memory parallel machines it is essential to have good control of task granularity. This paper describes a fairly accurate parallel simulator for Haskell, based on the Glasgow compiler, and complementary tools for visualising task granularities. Together these tools a ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
To take advantage of distributed-memory parallel machines it is essential to have good control of task granularity. This paper describes a fairly accurate parallel simulator for Haskell, based on the Glasgow compiler, and complementary tools for visualising task granularities. Together these tools allow us to study the effects of various annotations on task granularity on a variety of simulated parallel architectures. They also provide a more precise tool for the study of parallel execution than has previously been available for Haskell programs. These tools have already confirmed that thread migration is essential in parallel systems, demonstrated a close correlation between thread execution times and total heap allocations, and shown that fetching data synchronously normally gives better overall performance than asynchronous fetching, if data is fetched on demand. 1 Introduction Our aim is to produce fast, cost-effective implementations of lazy functional languages. One way to impro...
Space Profiling for Parallel Functional Programs
"... This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provi ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies yet abstracts away from many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.
A Parallel Functional Language Compiler for Message-Passing Multicomputers
, 1998
"... The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subs ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subset being compiled. Naira has been successfully parallelised and it is the largest successfully parallelised Haskell program having achieved good absolute speedups on a network of SUN workstations. Having the same basic structure as other production compilers of functional languages, Naira's parallelisation technology should carry forward to other functional language compilers. The back end of Naira is written in C and generates parallel code in the C language which is envisioned to be run on distributed-memory machines. The code generator is based on a novel compilation scheme specified using a restricted form of Milner's ß-calculus which achieves asynchronous communication. We present the f...
Thread Migration in a Parallel Graph Reducer
- In IFL’02 — Intl. Workshop on the Implementation of Functional Languages. Springer-Verlag, LNCS 2670
, 2002
"... To support high level coordination, parallel functional languages need eective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if cur ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
To support high level coordination, parallel functional languages need eective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if current work, or threads, are also distributed.
The Virtual Shared Memory Performance of a Parallel Graph Reducer
- In CCGrid 2002 — Intl. Symp. on Cluster Computing and the Grid
, 2002
"... This paper assesses the costs of maintaining a virtual shared heap in our parallel graph reducer (GUM), which implements a parallel functional language. GUM performs automatic and dynamic resource management for both work and data. We introduce extensions to the original design of GUM, aiming at a m ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper assesses the costs of maintaining a virtual shared heap in our parallel graph reducer (GUM), which implements a parallel functional language. GUM performs automatic and dynamic resource management for both work and data. We introduce extensions to the original design of GUM, aiming at a more flexible memory management and communication mechanism to deal with high-latency systems. We then present measurements of running GUM on a Beowulf cluster, evaluating the overhead of dynamic distributed memory management and the effectiveness of the new memory management and communication mechanisms. c IEEE; "CCGrid 2002", Berlin, May 2002.
Implementing High-Level Parallelism on Computational GRIDs
, 2006
"... This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that the copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that the copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the university (as may be appropriate). I hereby declare that the work presented in this the-sis was carried out by myself at Heriot-Watt University, Edinburgh, except where due acknowledgement is made, and has not been submitted for any other degree.
Scheduling Deterministic Parallel Programs
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the correctness of these programs. However, the performance of parallel programs still depends upon this assignment of tasks, as determined by a part of the language implementation called the scheduling policy. In this thesis, I define a novel cost semantics for a parallel language that enables programmers to reason formally about different scheduling policies. This cost semantics forms a basis for a suite of prototype profiling tools. These tools allow programmers to simulate and visualize program execution under different scheduling policies and understand how the choice of policy affects application memory use. My cost semantics also provides a specification for implementations of the language. As an example of such an implementation, I have extended MLton, a compiler

