Results 1 - 10
of
51
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution
"... The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many tasks, including debugging, testing, and automatic ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many tasks, including debugging, testing, and automatic replication. In this work, we avoid these complications by eliminating their root cause: we develop a compiler and runtime system that runs arbitrary multithreaded C/C++ POSIX Threads programs deterministically. A trivial non-performant approach to providing determinism is simply deterministically serializing execution. Instead, we present a compiler and runtime infrastructure that ensures determinism but resorts to serialization rarely, for handling interthread communication and synchronization. We develop two basic approaches, both of which are largely dynamic with performance improved by some static compiler optimizations. First, an ownership-based approach detects interthread communication via an evolving table that tracks ownership of memory regions by threads. Second, a buffering approach uses versioned memory and employs a deterministic commit protocol to make changes visible to other threads. While buffering has larger single-threaded overhead than ownership, it tends to scale better (serializing less often). A hybrid system sometimes performs and scales better than either approach individually. Our implementation is based on the LLVM compiler infrastructure. It needs neither programmer annotations nor special hardware. Our empirical evaluation uses the PARSEC and SPLASH2 benchmarks and shows that our approach scales comparably to nondeterministic execution.
Harnessing the Multicores: Nested Data Parallelism in Haskell
, 2008
"... ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has p ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is data parallelism. Blelloch’s pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested data parallelism) with a fast, scalable execution model (flat data parallelism). In this paper we describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. We focus particularly on the vectorisation transformation, which transforms nested to flat data parallelism. 1
Implicitlythreaded parallelism in Manticore
- In ICFP ’08
, 2008
"... The increasing availability of commodity multicore processors is making parallel computing available to the masses. Traditional parallel languages are largely intended for large-scale scientific computing and tend not to be well-suited to programming the applications one typically finds on a desktop ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
The increasing availability of commodity multicore processors is making parallel computing available to the masses. Traditional parallel languages are largely intended for large-scale scientific computing and tend not to be well-suited to programming the applications one typically finds on a desktop system. Thus we need new parallel-language designs that address a broader spectrum of applications. In this paper, we present Manticore, a language for building parallel applications on commodity multicore hardware including a diverse collection of parallel constructs for different granularities of work. We focus on the implicitly-threaded parallel constructs in our high-level functional language. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data parallel language designs, which has focused largely on parallel problems with regular structure and the compiler transformations — most notably, flattening — that make such designs feasible. We describe our implementation strategies and present some detailed examples utilizing various mechanisms of our language.
A scheduling framework for general-purpose parallel languages
- In Proc. of the Int. Conf. on Funct. Program
"... The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable parallelism at many different levels. For example, a ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable parallelism at many different levels. For example, an application might use both explicit threads to implement course-grain parallelism for independent tasks and implicit threads for fine-grain data-parallel computation over a large array. An important aspect of this requirement is supporting a wide range of different scheduling mechanisms for parallel computation. In this paper, we describe the scheduling framework that we have designed and implemented for Manticore, a strict parallel functional language. We take a micro-kernel approach in our design: the compiler and runtime support a small collection of scheduling primitives upon which complex scheduling policies can be implemented. This framework is extremely flexible and can support a wide range of different scheduling policies. It also supports the nesting of schedulers, which is key to both supporting multiple scheduling policies in the same application and to hierarchies of speculative parallel computations. In addition to describing our framework, we also illustrate its expressiveness with several popular scheduling techniques. We present a (mostly) modular approach to extending our schedulers to support cancellation. This mechanism is essential for implementing eager and speculative parallelism. We finally evaluate our framework with a series of benchmarks and an analysis.
Constructor specialisation for Haskell programs
, 2007
"... User-defined data types, pattern-matching, and recursion are ubiquitous features of Haskell programs. Sometimes a function is called with arguments that are statically known to already be in constructor form, so that the work of pattern-matching is wasted. Even worse, the argument is sometimes fres ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
User-defined data types, pattern-matching, and recursion are ubiquitous features of Haskell programs. Sometimes a function is called with arguments that are statically known to already be in constructor form, so that the work of pattern-matching is wasted. Even worse, the argument is sometimes freshly-allocated, only to be immediately decomposed by the function. In this paper we describe a simple, modular transformation that specialises recursive functions according to their argument “shapes”. We show that such a transformation has a simple, modular implementation, and that it can be extremely effective in practice, eliminating both pattern-matching and heap allocation. We describe our implementation of this constructor specialisation transformation in the Glasgow Haskell Compiler, and give measurements of its effectiveness.
Abstract Lightweight Concurrency Primitives for GHC
"... The Glasgow Haskell Compiler (GHC) has quite sophisticated support for concurrency in its runtime system, which is written in lowlevel C code. As GHC evolves, the runtime system becomes increasingly complex, error-prone, difficult to maintain and difficult to add new concurrency features. This paper ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The Glasgow Haskell Compiler (GHC) has quite sophisticated support for concurrency in its runtime system, which is written in lowlevel C code. As GHC evolves, the runtime system becomes increasingly complex, error-prone, difficult to maintain and difficult to add new concurrency features. This paper presents an alternative approach to implement concurrency in GHC. Rather than hard-wiring all kinds of concurrency features, the runtime system is a thin substrate providing only a small set of concurrency primitives, and the remaining concurrency features are implemented in software libraries written in Haskell. This design improves the safety of concurrency support; it also provides more customizability of concurrency features, which can be developed as Haskell library packages and deployed modularly. Categories and Subject Descriptors D.1.1 [Programming Techniques]:
Dependent Types for Distributed Arrays
"... Abstract. Locality-aware algorithms over distributed arrays can be very difficult to write. Yet such algorithms are becoming more and more important as desktop machines boast more and more processors. We show how a dependently-typed programming language can help develop such algorithms by hosting a ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Locality-aware algorithms over distributed arrays can be very difficult to write. Yet such algorithms are becoming more and more important as desktop machines boast more and more processors. We show how a dependently-typed programming language can help develop such algorithms by hosting a domain-specific embedded type system that ensures every well-typed program will only ever access local data. Such static guarantees can help catch programming errors early on in the development cycle and maximise the potential speedup that multicore machines offer. At the same time, the functional specification of effects we provide facilitates the testing of and reasoning about algorithms on distributed arrays. 1
Partial Vectorisation of Haskell Programs
"... Abstract. Vectorisation for functional programs, also called the flattening transformation, relies on drastically reordering computations and restructuring the representation of data types. As a result, it only applies to the purely functional core of a fully-fledged functional language, such as Has ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Vectorisation for functional programs, also called the flattening transformation, relies on drastically reordering computations and restructuring the representation of data types. As a result, it only applies to the purely functional core of a fully-fledged functional language, such as Haskell or ML. A concrete implementation needs to apply vectorisation selectively and integrate vectorised with unvectorised code. This is challenging, as vectorisation alters the data representation, which must be suitably converted between vectorised and unvectorised code. In this paper, we present an approach to partial vectorisation that selectively vectorises sub-expressions and data types, and also, enables linking vectorised with unvectorised modules.
Avalanche-Safe LINQ Compilation
- In Proc. VLDB
, 2010
"... We report on a query compilation technique that enables the construction of alternative efficient query providers for Microsoft’s Language Integrated Query (LINQ) framework. LINQ programs are mapped into an intermediate algebraic form, suitable for execution on any SQL:1999-capable relational databa ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We report on a query compilation technique that enables the construction of alternative efficient query providers for Microsoft’s Language Integrated Query (LINQ) framework. LINQ programs are mapped into an intermediate algebraic form, suitable for execution on any SQL:1999-capable relational database system. This compilation technique leads to query providers that (1) faithfully preserve list order and nesting, both being core features of the LINQ data model, (2) support the complete family of LINQ’s Standard Query Operators, (3) bring database support to LINQ to XML where the original provider performs in-memory query evaluation, and, most importantly, (4) emit SQL statement sequences whose size is only determined by the input query’s result type (and thus
Seq no more: Better Strategies for Parallel Haskell
"... We present a complete redesign of Evaluation Strategies, a key abstraction for specifying pure, deterministic parallelism in Haskell. Our new formulation preserves the compositionality and modularity benefits of the original, while providing significant new benefits. First, we introduce an evaluatio ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We present a complete redesign of Evaluation Strategies, a key abstraction for specifying pure, deterministic parallelism in Haskell. Our new formulation preserves the compositionality and modularity benefits of the original, while providing significant new benefits. First, we introduce an evaluation-order monad to provide clearer, more generic, and more efficient specification of parallel evaluation. Secondly, the new formulation resolves a subtle space management issue with the original strategies, allowing parallelism (sparks) to be preserved while reclaiming heap associated with superfluous parallelism. Related to this, the new formulation provides far better support for speculative parallelism as the garbage collector now prunes unneeded speculation. Finally, the new formulation provides improved compositionality: we can directly express parallelism embedded within lazy data structures, producing more compositional strategies, and our basic strategies are parametric in the coordination combinator, facilitating a richer set of parallelism combinators. We give measurements over a range of benchmarks demonstrating that the runtime overheads of the new formulation relative to the original are low, and the new strategies even yield slightly better speedups on average than the original strategies.

