Results 1 - 10
of
12
Harnessing the Multicores: Nested Data Parallelism in Haskell
, 2008
"... ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has p ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is data parallelism. Blelloch’s pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested data parallelism) with a fast, scalable execution model (flat data parallelism). In this paper we describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. We focus particularly on the vectorisation transformation, which transforms nested to flat data parallelism. 1
A scheduling framework for general-purpose parallel languages
- In Proc. of the Int. Conf. on Funct. Program
"... The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable parallelism at many different levels. For example, a ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable parallelism at many different levels. For example, an application might use both explicit threads to implement course-grain parallelism for independent tasks and implicit threads for fine-grain data-parallel computation over a large array. An important aspect of this requirement is supporting a wide range of different scheduling mechanisms for parallel computation. In this paper, we describe the scheduling framework that we have designed and implemented for Manticore, a strict parallel functional language. We take a micro-kernel approach in our design: the compiler and runtime support a small collection of scheduling primitives upon which complex scheduling policies can be implemented. This framework is extremely flexible and can support a wide range of different scheduling policies. It also supports the nesting of schedulers, which is key to both supporting multiple scheduling policies in the same application and to hierarchies of speculative parallel computations. In addition to describing our framework, we also illustrate its expressiveness with several popular scheduling techniques. We present a (mostly) modular approach to extending our schedulers to support cancellation. This mechanism is essential for implementing eager and speculative parallelism. We finally evaluate our framework with a series of benchmarks and an analysis.
Lazy Tree Splitting
"... Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.
Scheduling Deterministic Parallel Programs
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the correctness of these programs. However, the performance of parallel programs still depends upon this assignment of tasks, as determined by a part of the language implementation called the scheduling policy. In this thesis, I define a novel cost semantics for a parallel language that enables programmers to reason formally about different scheduling policies. This cost semantics forms a basis for a suite of prototype profiling tools. These tools allow programmers to simulate and visualize program execution under different scheduling policies and understand how the choice of policy affects application memory use. My cost semantics also provides a specification for implementations of the language. As an example of such an implementation, I have extended MLton, a compiler
unknown title
, 2010
"... The goal of my research is to increase the correctness and efficiency of high-level language implementations. I am particularly interested in the areas of automated scheduling and memory management and in the application of domain-specific languages to improve the reliability and flexibility of comp ..."
Abstract
- Add to MetaCart
The goal of my research is to increase the correctness and efficiency of high-level language implementations. I am particularly interested in the areas of automated scheduling and memory management and in the application of domain-specific languages to improve the reliability and flexibility of complex software systems. My current research is on problems of scheduling in a high-level parallel language. High-level parallel languages, such as Data-Parallel Haskell [CLP+ 07] and Manticore [FRRS08], provide implicit and explicit threading mechanisms. An implicit thread is a linguistic construct that acts as a hint to the scheduler for where parallel evaluation may be profitable. An explicit thread is a true thread of control that provides a mechanism for concurrent programming and coarse-grain parallel programming. A major focus of my work is addressing the problems of scheduling in a language that has both types of threads. The combination of implicit and explicit threads creates scheduling requirements that go beyond the capabilities of existing language implementations. For example, an explicit thread may need one of many real-time scheduling policies to provide responsiveness or, alternatively, a resource-aware policy [vCZ+ 03] to maximize throughput. An implicit thread needs a work-distribution policy
General Terms Languages, Performance
"... and run-time systems Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-lik ..."
Abstract
- Add to MetaCart
and run-time systems Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks. 1.
unknown title
, 2010
"... The goal of my research is to make it easier to write correct and efficient programs through advances in the design and implementation of declarative languages. Declarative languages provide programmers with such essential services as automatic thread scheduling and memory management. My research fo ..."
Abstract
- Add to MetaCart
The goal of my research is to make it easier to write correct and efficient programs through advances in the design and implementation of declarative languages. Declarative languages provide programmers with such essential services as automatic thread scheduling and memory management. My research focuses on improving the effectiveness and flexibility of these services. Declarative languages, such as PML [FRRS08] and Data-Parallel Haskell [CLP + 07], provide implicit and explicit threading mechanisms. An implicit thread is a linguistic construct that acts as a hint to the scheduler for where parallel evaluation may be profitable. Explicit threads provide a mechanism for concurrent programming and coarse-grain parallel programming. My thesis research presents the design of an effective system for a language that supports implicit threading and runs on a shared-memory multiprocessor. An effective system is scalable and robust. A system is scalable if performance improves in proportion to the number of processing elements. A system is robust when performance is consistently good under changing conditions, such as a change of input data set, number of processors, or hardware platform. Robust systems are predictable across programs generally, not just those tuned for a particular set of conditions. Research on thread scheduling provides evidence that no single scheduling policy is suitable for every application
General Terms
"... In a parallel, shared-memory, language with a garbage collected heap, it is desirable for each processor to perform minor garbage collections independently. Although obvious, it is difficult to make this idea pay off in practice, especially in languages where mutation is common. We present several t ..."
Abstract
- Add to MetaCart
In a parallel, shared-memory, language with a garbage collected heap, it is desirable for each processor to perform minor garbage collections independently. Although obvious, it is difficult to make this idea pay off in practice, especially in languages where mutation is common. We present several techniques that substantially improve the state of the art. We describe these techniques in the context of a full-scale implementation of Haskell, and demonstrate that our local-heap collector substantially improves scaling, peak performance, and robustness.

