Results 1 - 10
of
13
Algorithm + Strategy = Parallelism
- JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of ..."
Abstract
-
Cited by 51 (18 self)
- Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of non-strict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand -conquer, pipeline parallelism, producer/consumer parallelism, and data-oriented parallelism. Because they are based on unrestricted higher-order functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several large-scale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.
The Design, Implementation, and Evaluation of Jade
- ACM Transactions on Programming Languages and Systems
, 1998
"... this article we discuss the design goals and decisions that determined the final form of Jade and present an overview of the Jade implementation. We also present our experience using Jade to implement several complete scientific and engineering applications. We use this experience to evaluate how th ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
this article we discuss the design goals and decisions that determined the final form of Jade and present an overview of the Jade implementation. We also present our experience using Jade to implement several complete scientific and engineering applications. We use this experience to evaluate how the different Jade language features were used in practice and how well Jade as a whole supports the process of developing parallel applications. We find that the basic idea of preserving the serial semantics simplifies the program development process, and that the concept of using data access specifications to guide the parallelization offers significant advantages over more traditional control-based approaches. We also find that the Jade data model can interact poorly with concurrency patterns that write disjoint pieces of a single aggregate data structure, although this problem arises in only one of the applications. Categories and Subject Descriptors: D.1.3 [Programming Te
Feedback Directed Implicit Parallelism
"... In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the pr ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the program with this work being performed speculatively via a work-stealing system and then (iv) to detect at run-time any attempt to perform operations that would reveal the presence of speculation. We assess the practicality of the approach through an implementation based on GHC 6.6 along with a limit study based on the execution profiles we gathered. We support the full Concurrent Haskell language compiled with traditional optimizations and including I/O operations and synchronization as well as pure computation. We use 20 of the larger programs from the ‘nofib ’ benchmark suite. The limit study shows that programs vary a lot in the parallelism we can identify: some have none, 16 have a potential 2x speed-up, 4 have 32x. In practice, on a 4-core processor, we get 10-80 % speed-ups on 7 programs. This is mainly achieved at the addition of a second core rather than beyond this. This approach is therefore not a replacement for manual parallelization, but rather a way of squeezing extra performance out of the threads of an already-parallel program or out of a program that has not yet been parallelized.
Lazy Threads: Compiler and Runtime Structures for Fine-Grained Parallel Programming
, 1997
"... Many modern parallel languages support dynamic creation of threads or require multithreading in their implementations. The threads describe the logical parallelism in the program. For ease of expression and better resource utilization, the logical parallelism in a program often exceeds the physical ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Many modern parallel languages support dynamic creation of threads or require multithreading in their implementations. The threads describe the logical parallelism in the program. For ease of expression and better resource utilization, the logical parallelism in a program often exceeds the physical parallelism of the machine and leads to applications with many fine-grained threads. In practice, however, most logical threads need not be independent threads. Instead, they could be run as sequential calls, which are inherently cheaper than independent threads. The challenge is that one cannot generally predict which logical threads can be implemented as sequential calls. In lazy multithreading systems each logical thread begins execution sequentially (with the attendant effic...
Process Semantics of Graph Reduction
- Proc. CONCUR '95, volume 962 of Lecture Notes in Computer Science
, 1995
"... This paper introduces an operational semantics for call-by-need reduction in terms of Milner's ß-calculus. The functional programming interest lies in the use of ß-calculus as an abstract yet realistic target language. The practical value of the encoding is demonstrated with an outline for a paralle ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper introduces an operational semantics for call-by-need reduction in terms of Milner's ß-calculus. The functional programming interest lies in the use of ß-calculus as an abstract yet realistic target language. The practical value of the encoding is demonstrated with an outline for a parallel code generator. From a theoretical perspective, the ß-calculus representation of computational strategies with shared reductions is novel and solves a problem posed by Milner [13]. The compactness of the process calculus presentation makes it interesting as an alternative definition of call-by-need. Correctness of the encoding is proved with respect to the call-by-need -calculus of Ariola et al. [3]. 1 Introduction Graph reduction of extended -calculi has become a mature field of applied research. The efficiency of the implementations is due in great measure to a technique known as `sharing', whereby argument values are computed (at most) once and then memoized for future reference. Both...
Interpreter Prototypes From Formal Language Definitions
, 1993
"... Denotational semantics is now used widely for the formal definition of programming languages but there is a lack of appropriate tools to support language development. General purpose language implementation systems are oriented to syntax with poor support for semantics. Specialised denotational sema ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Denotational semantics is now used widely for the formal definition of programming languages but there is a lack of appropriate tools to support language development. General purpose language implementation systems are oriented to syntax with poor support for semantics. Specialised denotational semantics based systems correspond closely to the formalism but are rendered inflexible for language experimentation by their monolithic multiple stages Exploratory language development with formal definitions is better served by a unitary notation, encompassing syntax and semantics, which is close to but simpler than denotational semantics. Interactive implementation of the notation then facilitates language investigation through the direct execution of a formal definition as an interpreter for the defined language. This thesis presents Navel, a run-time typed, applicative order, pure functional programming language with integrated context free grammar rules. Navel has been used to develop prot...
Partitioning Non-strict Languages for Multi-threaded Code Generation
- Master's thesis, Dept. of EECS, MIT
, 1994
"... In a non-strict language, functions may return values before their arguments are available, and data structures may be defined before all their components are defined. Compiling such languages to conventional hardware is not straightforward; instructions do not have a fixed compile time ordering. Su ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In a non-strict language, functions may return values before their arguments are available, and data structures may be defined before all their components are defined. Compiling such languages to conventional hardware is not straightforward; instructions do not have a fixed compile time ordering. Such an ordering is necessary to execute programs efficiently on current microprocessors. Partitioning is the process of compiling a non-strict program into threads (i.e., a sequence of instructions). This process involves detecting data dependencies at compile time and using these dependencies to "sequentialize" parts of the program. Previous work on partitioning did not propagate dependence information across recursive procedure boundaries. Using a representation known as Paths we are able to represent dependence information of recursive functions. Also, we incorporate them into a known partitioning algorithm. However, this algorithm fails to make use of all the information contained in pat...
Partitioning Non-strict Functional Languages for Multi-threaded Code Generation
- In Proceedings of Static Analysis Symposium '95
, 1995
"... In this paper, we present a new approach to partitioning, the problem of generating sequential threads for programs written in a non-strict functional language. The goal of partitioning is to generate threads as large as possible, while retaining the non-strict semantics of the program. We define p ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we present a new approach to partitioning, the problem of generating sequential threads for programs written in a non-strict functional language. The goal of partitioning is to generate threads as large as possible, while retaining the non-strict semantics of the program. We define partitioning as a program transformation and design algorithms for basic block partitioning and inter-procedural partitioning. The inter-procedural algorithm presented here is more powerful than the ones previously known and is based on abstract interpretation, enabling the algorithm to handle recursion in a straightforward manner. We prove the correctness of these algorithms in a denotational semantic framework. Keywords: Partitioning, abstract interpretation, demand and tolerance sets, inter-procedural analysis, non-strict functional languages. 1 Introduction Functional programming languages can be divided into two classes: strict and non-strict. In a non-strict language, functions may r...
An Overview of the Parallel Language Id (a foundation for pH, a parallel dialect of Haskell)
- Digital Equipment Corporation, Cambridge Research Laboratory
, 1993
"... : Id is an architecture-independent, general-purpose parallel programming language that has evolved and been in use for a number of years. Id does not have a sequential core; rather, it is implicitly parallel, with the programmer introducing sequencing explicitly only if necessary. Id is a mostly ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
: Id is an architecture-independent, general-purpose parallel programming language that has evolved and been in use for a number of years. Id does not have a sequential core; rather, it is implicitly parallel, with the programmer introducing sequencing explicitly only if necessary. Id is a mostly-functional language, in the family of non-strict functional languages with a HindleyMilner static type system. Unlike other non-strict functional languages, it uses lenient, not lazy evaluation, for reasons of parallelism, as well as to give meaning to non-functional constructs. The non-functional constructs come in two layers: I-structures (which preserve determinacy, but not referential transparency, and are closely related to logic variables), and M-structures, which are side-effects with implicit synchronization. The layers are distinguished by syntax and types, so that it is possible for an implementation to force a program to be within a desired layer (e.g., purely functional)....
How Much Non-strictness do Lenient Programs Require?
- In Conf. on Func. Prog. Languages and Computer Architecture
, 1995
"... Lenient languages, such as Id90, have been touted as among the best functional languages for massively parallel machines [AHN88]. Lenient evaluation combines non-strict semantics with eager evaluation [Tra91]. Non-strictness gives these languages more expressive power than strict semantics, while ea ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Lenient languages, such as Id90, have been touted as among the best functional languages for massively parallel machines [AHN88]. Lenient evaluation combines non-strict semantics with eager evaluation [Tra91]. Non-strictness gives these languages more expressive power than strict semantics, while eager evaluation ensures the highest degree of parallelism. Unfortunately, non-strictness incurs a large overhead, as it requires dynamic scheduling and synchronization. As a result, many powerful program analysis techniques have been developed to statically determine when non-strictness is not required [CPJ85, Tra91, Sch94]. This paper studies a large set of lenient programs and quantifies the degree of non-strictness they require. We identify several forms of non-strictness, including functional, conditional, and data structure non-strictness. Surprisingly, most Id90 programs require neither functional nor conditional non-strictness. Many benchmark programs, however, make use of a limited fo...

