Results 1 -
9 of
9
Runtime Support for Multicore Haskell
"... Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describ ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describe just such an implementation, and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build. Our measurements are necessarily detailed and specific, but they are reproducible, and we believe that they offer some general insights. 1.
A consistent semantics of self-adjusting computation
, 2006
"... Abstract. This paper presents a semantics of self-adjusting computation and proves that the semantics is correct and consistent. The semantics integrates change propagation with the classic idea of memoization to enable reuse of computations under mutation to memory. During evaluation, reuse of a co ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. This paper presents a semantics of self-adjusting computation and proves that the semantics is correct and consistent. The semantics integrates change propagation with the classic idea of memoization to enable reuse of computations under mutation to memory. During evaluation, reuse of a computation via memoization triggers a change propagation that adjusts the reused computation to reflect the mutated memory. Since the semantics combines memoization and change-propagation, it involves both non-determinism and mutation. Our consistency theorem states that the non-determinism is not harmful: any two evaluations of the same program starting at the same state yield the same result. Our correctness theorem states that mutation is not harmful: self-adjusting programs are consistent with purely functional programming. We formalized the semantics and its meta-theory in the LF logical framework and machine-checked the proofs in Twelf. 1
Higher-Order Functional Reactive Programming in Bounded Space
"... Functional reactive programming (FRP) is an elegant and successful approach to programming reactive systems declaratively. The high levels of abstraction and expressivity that make FRP attractive as a programming model do, however, often lead to programs whose resource usage is excessive and hard to ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Functional reactive programming (FRP) is an elegant and successful approach to programming reactive systems declaratively. The high levels of abstraction and expressivity that make FRP attractive as a programming model do, however, often lead to programs whose resource usage is excessive and hard to predict. In this paper, we address the problem of space leaks in discretetime functional reactive programs. We present a functional reactive programming language that statically bounds the size of the dataflow graph a reactive program creates, while still permitting use of higher-order functions and higher-type streams such as streams of streams. We achieve this with a novel linear type theory that both controls allocation and ensures that all recursive definitions are well-founded. We also give a denotational semantics for our language by combining recent work on metric spaces for the interpretation of higher-order causal functions with length-space models of spacebounded computation. The resulting category is doubly closed and hence forms a model of the logic of bunched implications.
Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures
"... Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but primarily for the restricted class of nested-parallel (or fully strict) computations. We extend this prior work by consi ..."
Abstract
- Add to MetaCart
Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but primarily for the restricted class of nested-parallel (or fully strict) computations. We extend this prior work by considering a broader class of programs that also supports pipelined parallelism through the use of parallel futures. Though the overhead of work-stealing schedulers is often quantified in terms of the number of steals, we show that a broader metric, the number of deviations, is a better way to quantify work-stealing overhead for less restrictive forms of parallelism, including parallel futures. For such parallelism, we prove bounds on work-stealing overheads—scheduler time and cache misses—as a function of the number of deviations. Deviations can occur, for example, when work is stolen or when a future is touched. We also show instances where deviations can occur independently of steals and touches. Next, we prove that, under work stealing, the expected number of deviations is O(P d+td) in a P-processor execution of a computation with span d and t touches of futures. Moreover, this bound is existentially tight for any work-stealing scheduler that is parsimonious (those where processors steal only when their queues are empty); this class includes all prior work-stealing schedulers. We also present empirical measurements of the number of deviations incurred by a classic application of futures, Halstead’s quicksort, using our parallel implementation of ML. Finally, we identify a family of applications that use futures and, in contrast to quicksort, incur significantly smaller overheads.
Work Efficient Higher-Order Vectorisation
"... Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which ..."
Abstract
- Add to MetaCart
Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which limits the practicality of this method. We discuss why this problem arises, identify the mis-handling of index space transforms as the root cause, and present a solution using a refined representation of nested arrays. We have implemented this solution in Data Parallel Haskell (DPH) and present benchmarks showing that realistic programs, which used to suffer the blow-up, now have the correct asymptotic work complexity. In some cases, the asymptotic complexity of the vectorised program is even better than the original. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Concurrent programming structures; Polymorphism; Abstract data types
Semantic Solutions to Program Analysis Problems
"... Problems in program analysis can be solved by developing novel program semantics and deriving abstractions conventionally. For over thirty years, higher-order program analysis has been sold as a hard problem. Its solutions have required ingenuity and complex models of approximation. We claim that th ..."
Abstract
- Add to MetaCart
Problems in program analysis can be solved by developing novel program semantics and deriving abstractions conventionally. For over thirty years, higher-order program analysis has been sold as a hard problem. Its solutions have required ingenuity and complex models of approximation. We claim that this difficulty is due to premature focus on abstraction and propose a new approach that emphasizes semantics. Its simplicity enables new analyses that are beyond the current state of the art. Current Thoughts, New Ideas Higher-order program analysis has been an important and recurring topic at PLDI, starting with Shivers ’ seminal paper [1] and continuing through the present [2]. However, past approaches are limited in the language features they can handle, require intricate formal models that are difficult to develop, verify, and maintain, and do
ADMINISTERING ORGANISATION
, 2011
"... Hårdvaruacceleration av algoritmer genom funktionell programmering ..."
Cache and I/O Efficient Functional Algorithms
"... The widely studied I/O and ideal-cache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size M, an unbounded secondary mem ..."
Abstract
- Add to MetaCart
The widely studied I/O and ideal-cache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size M, an unbounded secondary memory organized in blocks of size B. The cost measure is based purely on the number of block transfers between the primary and secondary memory. All other operations are free. Many algorithms have been analyzed in these models and indeed these models predict the relative performance of algorithms much more accurately than the standard RAM model. The models, however, require specifying algorithms at a very low level requiring the user to carefully lay out their data in arrays in memory and manage their own memory allocation. In this paper we present a cost model for analyzing the memory efficiency of algorithms expressed in a simple functional language. We show how some algorithms written in standard forms using just lists and trees (no arrays) and requiring no explicit memory layout or memory management are efficient in the model. We then describe an implementation of the language and show provable bounds for mapping the cost in our model to the cost in the idealcache model. These bound imply that purely functional programs based on lists and trees with no special attention to any details of memory layout can be as asymptotically as efficient as the carefully designed imperative I/O efficient algorithms. For example we describe an O ( n B logM/B n) cost sorting algorithm, which is
Vectorisation Avoidance
"... Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data structures and traversals are introduced, whic ..."
Abstract
- Add to MetaCart
Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data structures and traversals are introduced, which may or may not be eliminated by subsequent optimisation. We present a novel program analysis to identify parts of the program where flattening would only introduce overhead, without appropriate gain. We present empirical evidence that avoiding vectorisation in these cases leads to more efficient programs than if we had applied vectorisation and then relied on array fusion to eliminate intermediates from the resulting code. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Concurrent programming structures; Polymorphism; Abstract data types

