Results 1 -
5 of
5
Implementing a High-level Distributed-Memory Parallel Haskell in Haskell
"... Abstract. We present the initial design, implementation and preliminary evaluation of a new distributed memory parallel Haskell, HdpH. The language is a shallowly embedded parallel extension of Haskell that supports high-level semiexplicit parallelism, is scalable, and has the potential for fault to ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We present the initial design, implementation and preliminary evaluation of a new distributed memory parallel Haskell, HdpH. The language is a shallowly embedded parallel extension of Haskell that supports high-level semiexplicit parallelism, is scalable, and has the potential for fault tolerance. The HdpH implementation is designed for maintainability without compromising performance too severely. To provide maintainability the implementation is modular and layered and, crucially, coded in vanilla Concurrent Haskell. Initial performance results are promising for three simple data parallel or divide-and-conquer programs, e.g. an absolute speedup of 135 on 168 cores of a Beowulf cluster. 1
Architecture Aware Parallel Programming in
"... General purpose computing architectures are evolving quickly to become many-core and hierarchical: i.e. a core can communicate more quickly locally than globally. To be effective on such architectures programming models must be aware of the communication hierarchy, and yet preserve performance porta ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
General purpose computing architectures are evolving quickly to become many-core and hierarchical: i.e. a core can communicate more quickly locally than globally. To be effective on such architectures programming models must be aware of the communication hierarchy, and yet preserve performance portability. We propose four new architecture-aware constructs for the parallel Haskell extension GpH that exploit information about task size and aim to reduce communication for small tasks, preserve data locality, or to distribute large units of work. We report a preliminary investigation of architecture aware programming models that abstract over the new constructs. In particular we propose architecture aware evaluation strategies and skeletons. We investigate three common parallel paradigms on hierarchical architectures with up to 224 cores. The results show that the architectureaware constructs consistently deliver better speedup and scalability than existing constructs together with dramatically reduced variability. In some experiments speedup is improved by an order of magnitude. Keywords: 1.
A Contextual Semantics for Concurrent Haskell with Futures
, 2011
"... Abstract. In this paper we analyze the semantics of a higher-order functional language with concurrent threads, monadic IO and synchronizing variables as in Concurrent Haskell. To assure declarativeness of concurrent programming we extend the language by implicit, monadic, and concurrent futures. As ..."
Abstract
- Add to MetaCart
Abstract. In this paper we analyze the semantics of a higher-order functional language with concurrent threads, monadic IO and synchronizing variables as in Concurrent Haskell. To assure declarativeness of concurrent programming we extend the language by implicit, monadic, and concurrent futures. As semantic model we introduce and analyze the process calculus CHF, which represents a typed core language of Concurrent Haskell extended by concurrent futures. Evaluation in CHF is defined by a small-step reduction relation. Using contextual equivalence based on may- and should-convergence as program equivalence, we show that various transformations preserve program equivalence. We establish a context lemma easing those correctness proofs. An important result is that call-by-need and call-by-name evaluation are equivalent in CHF, since they induce the same program equivalence. Finally we show that the monad laws hold in CHF under mild restrictions on Haskell’s seq-operator, which for instance justifies the use of the do-notation. 1
Comparing UPC and one-sided MPI: A distributed hash table for GAP
"... The GAP (Groups, Algebra and Programming) software is an interpreted programming language for symbolic algebra computation. It also provides a library of mathematical functionality. A key computational pattern for the GAP community is the orbit problem, that of a group acting upon a set. Computation ..."
Abstract
- Add to MetaCart
The GAP (Groups, Algebra and Programming) software is an interpreted programming language for symbolic algebra computation. It also provides a library of mathematical functionality. A key computational pattern for the GAP community is the orbit problem, that of a group acting upon a set. Computationally this maps onto the graph discovery problem. The enumeration of very large orbits corresponds to the traversal of a graph with billions of vertices. A hash table is used to check whether a vertex has been visited before during the computation. The large memory requirements of such a computation necessitates using a distributed memory machine. Building a parallel version of GAP is the goal of the HPC-GAP project. Message passing (MPI) and PGAS (UPC) are considered as the models for parallelisation. UPC has some advantages over MPI as some of the data structures anticipated in a parallel implementation of GAP can be simply constructed as shared objects in a PGAS model. Moreover, some of the communication patterns are not suited to the synchronous send and receive model of message passing. For example, in a parallel implementation of a hash table, the task or thread which computes the hash of an object, then knows the table entry and thus whether hash table access is remote or local. For MPI, the usual send- receive mechanism is compromised because the receiving rank cannot determine when, and from whom a message is to be passed. One-side MPI communications can be used to circumvent the problem. Windows of remote access memory are created, and guarded by locks. In UPC, the natural, shared arrays are used, again guarded by locks, However, the locking strategy for MPI and UPC is different. In this paper, the per-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Noname manuscript No. (will be inserted by the editor) Comparing Low-Pain and No-Pain Multicore Haskells
"... the date of receipt and acceptance should be inserted later Abstract Multicore and NUMA architectures are becoming the dominant processor technology and functional languages are theoretically well suited to exploit them. In practice, however, implementing effective high level parallel functional lan ..."
Abstract
- Add to MetaCart
the date of receipt and acceptance should be inserted later Abstract Multicore and NUMA architectures are becoming the dominant processor technology and functional languages are theoretically well suited to exploit them. In practice, however, implementing effective high level parallel functional languages is extremely challenging. This paper is a systematic programming and performance comparison of four parallel Haskell implementations on a common multicore architecture. It provides a detailed analysis of the performance, and contrasts the programming effort that each language requires with the parallel performance delivered. The study uses 15 ’typical ’ programs to compare a ‘no pain’, i.e. entirely implicit, parallel implementation with three ‘low pain’, i.e. semi-explicit, language implementations. We report detailed studies comparing the parallel performance delivered. The comparative performance metric is speedup which normalises against sequential performance. We ground the speedup comparisons by reporting both sequential and parallel runtimes and efficiencies for three of the languages. To measure the programming effort

