Results 1 -
8 of
8
Harnessing the Multicores: Nested Data Parallelism in Haskell
, 2008
"... ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has p ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is data parallelism. Blelloch’s pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested data parallelism) with a fast, scalable execution model (flat data parallelism). In this paper we describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. We focus particularly on the vectorisation transformation, which transforms nested to flat data parallelism. 1
Toward a parallel implementation of Concurrent ML
"... Abstract. Concurrent ML (CML) is a high-level message-passing language that supports the construction of first-class synchronous abstractions called events. This mechanism has proven quite effective over the years and has been incorporated in a number of other languages. While CML provides a concurr ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Abstract. Concurrent ML (CML) is a high-level message-passing language that supports the construction of first-class synchronous abstractions called events. This mechanism has proven quite effective over the years and has been incorporated in a number of other languages. While CML provides a concurrent programming model, its implementation has always been limited to uniprocessors. This limitation is exploited in the implementation of the synchronization protocol that underlies the event mechanism, but with the advent of cheap parallel processing on the desktop (and laptop), it is time for Parallel CML. We are pursuing such an implementation as part of the Manticore project. In this paper, we describe a parallel implementation of Asymmetric CML (ACML), which is a subset of CML that does not support output guards. We describe an optimistic concurrency protocol for implementing CML synchronization. This protocol has been implemented as part of the Manticore system. 1
Space Profiling for Parallel Functional Programs
"... This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provi ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies yet abstracts away from many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.
Partial Vectorisation of Haskell Programs
"... Abstract. Vectorisation for functional programs, also called the flattening transformation, relies on drastically reordering computations and restructuring the representation of data types. As a result, it only applies to the purely functional core of a fully-fledged functional language, such as Has ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Vectorisation for functional programs, also called the flattening transformation, relies on drastically reordering computations and restructuring the representation of data types. As a result, it only applies to the purely functional core of a fully-fledged functional language, such as Haskell or ML. A concrete implementation needs to apply vectorisation selectively and integrate vectorised with unvectorised code. This is challenging, as vectorisation alters the data representation, which must be suitably converted between vectorised and unvectorised code. In this paper, we present an approach to partial vectorisation that selectively vectorises sub-expressions and data types, and also, enables linking vectorised with unvectorised modules.
Scheduling light-weight parallelism in ArTCoP
"... Abstract. We present the design and prototype implementation of the scheduling component in ArTCoP (architecture transparent control of parallelism), a novel run-time environment (RTE) for parallel execution of high-level languages. A key feature of ArTCoP is its support for deep process and memory ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We present the design and prototype implementation of the scheduling component in ArTCoP (architecture transparent control of parallelism), a novel run-time environment (RTE) for parallel execution of high-level languages. A key feature of ArTCoP is its support for deep process and memory hierarchies, shown in the scheduler by supporting light-weight threads. To realise a system with easily exchangeable components, the system defines a micro-kernel, providing basic infrastructure, such as garbage collection. All complex RTE operations, including the handling of parallelism, are implemented at a separate system level. By choosing Concurrent Haskell as high-level system language, we obtain a prototype in the form of an executable specification that is easier to maintain and more flexible than conventional RTEs. We demonstrate the flexibility of this approach by presenting implementations of a scheduler for light-weight threads in ArTCoP, based on GHC Version 6.6.
unknown title
, 2010
"... The goal of my research is to make it easier to write correct and efficient programs through advances in the design and implementation of declarative languages. Declarative languages provide programmers with such essential services as automatic thread scheduling and memory management. My research fo ..."
Abstract
- Add to MetaCart
The goal of my research is to make it easier to write correct and efficient programs through advances in the design and implementation of declarative languages. Declarative languages provide programmers with such essential services as automatic thread scheduling and memory management. My research focuses on improving the effectiveness and flexibility of these services. Declarative languages, such as PML [FRRS08] and Data-Parallel Haskell [CLP + 07], provide implicit and explicit threading mechanisms. An implicit thread is a linguistic construct that acts as a hint to the scheduler for where parallel evaluation may be profitable. Explicit threads provide a mechanism for concurrent programming and coarse-grain parallel programming. My thesis research presents the design of an effective system for a language that supports implicit threading and runs on a shared-memory multiprocessor. An effective system is scalable and robust. A system is scalable if performance improves in proportion to the number of processing elements. A system is robust when performance is consistently good under changing conditions, such as a change of input data set, number of processors, or hardware platform. Robust systems are predictable across programs generally, not just those tuned for a particular set of conditions. Research on thread scheduling provides evidence that no single scheduling policy is suitable for every application
Workshop on Declarative Aspects of Multicore Programming
, 2008
"... Parallelism is going mainstream. Many chip manufactures are turning to multicore processor designs rather than scalar-oriented frequency increases as a way to get performance in their desktop, enterprise, and mobile processors. This endeavor is not likely to succeed long term if mainstream applicati ..."
Abstract
- Add to MetaCart
Parallelism is going mainstream. Many chip manufactures are turning to multicore processor designs rather than scalar-oriented frequency increases as a way to get performance in their desktop, enterprise, and mobile processors. This endeavor is not likely to succeed long term if mainstream applications cannot be parallelized to take advantage of tens and eventually hundreds of hardware threads. Multicore architectures will differ in significant ways from their multisocket predecessors. For example, the communication to compute bandwidth ratio is likely to be higher, which will positively impact performance. More generally, multicore architectures introduce several new dimensions of variability in both performance guarantees and architectural contracts, such as the memory model, that may not stabilize for several generations of product. Programs written in functional or (constraint-)logic programming languages, or even in other languages with a controlled use of side effects, can greatly simplify parallel programming. Such declarative programming allows for a deterministic semantics even when the underlying implementation might be highly

