Results 1 - 10
of
10
The HDG-Machine: A Highly Distributed Graph-Reducer for a Transputer Network
- The Computer Journal
, 1991
"... Distributed implementations of programming languages with implicit parallelism hold out the prospect that the parallel programs are immediately scalable. This paper presents some of the results of our part of Esprit 415, in which we considered the implementation of lazy functional programming langua ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
Distributed implementations of programming languages with implicit parallelism hold out the prospect that the parallel programs are immediately scalable. This paper presents some of the results of our part of Esprit 415, in which we considered the implementation of lazy functional programming languages on distributed architectures. A compiler and abstract machine were designed to achieve this goal. The abstract parallel machine was formally specified, using Miranda 1 . Each instruction of the abstract machine was then implemented as a macro in the Transputer Assembler. Although macro expansion of the code results in non-optimal code generation, use of the Miranda specification makes it possible to validate the compiler before the Transputer code is generated. The hardware currently available consists of five T800--25's, each board having 16M bytes of memory. Benchmark timings using this hardware are given. In spite of the straight forward code-generation, the resulting system compar...
Using Projection Analysis in Compiling Lazy Functional Programs
- In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming
, 1990
"... Projection analysis is a technique for finding out information about lazy functional programs. We show how the information obtained from this analysis can be used to speed up sequential implementations, and introduce parallelism into parallel implementations. The underlying evaluation model is evalu ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Projection analysis is a technique for finding out information about lazy functional programs. We show how the information obtained from this analysis can be used to speed up sequential implementations, and introduce parallelism into parallel implementations. The underlying evaluation model is evaluation transformers, where the amount of evaluation that is allowed of an argument in a function application depends on the amount of evaluation allowed of the application. We prove that the transformed programs preserve the semantics of the original programs. Compilation rules, which encode the information from the analysis, are given for sequential and parallel machines. 1 Introduction A number of analyses have been developed which find out information about programs. The methods that have been developed fall broadly into two classes, forwards analyses such as those based on the ideas of abstract interpretation (e.g. [9, 18, 19, 7, 17, 12, 4, 20]), and backward analyses such as those based...
Parallel Graph Reduction with the -machine
, 1989
"... We have implemented a parallel graph reducer on a commercially available shared memory multiprocessor (a Sequent Symmetry TM ), that achieves real speedup compared to a a fast compiled implementation of the conventional Gmachine. Using 15 processors, this speedup ranges between 5 and 11, depending ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We have implemented a parallel graph reducer on a commercially available shared memory multiprocessor (a Sequent Symmetry TM ), that achieves real speedup compared to a a fast compiled implementation of the conventional Gmachine. Using 15 processors, this speedup ranges between 5 and 11, depending on the program. Underlying the implementation is an abstract machine called the h; Gi-machine. We describe the sequential and the parallel h; Gi-machine, and our implementation of them. We provide performance and speedup figures and graphs. 1 Introduction Compiled graph reduction, as embodied in the G-machine and the Lazy ML compiler [Aug84, Joh84] has proved to be rather an efficient way to implement lazy functional languages on conventional machines. In this paper we report our results on extending these compilation techniques for parallel computers. We have implemented a parallel graph reduction system, a modified parallel G-machine, in a commercially available shared memory multicomput...
Comparison of Dynamic Load Balancing Strategies
- Parallel and Distributed Processing
, 1990
"... Parallel implementations of functional programming languages are often based on graph reduction. Tasks represented by task nodes in the graph are generated during run time. These tasks have to be distributed among the processors dynamically. For a prototype implementation [LK89] of such a language o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Parallel implementations of functional programming languages are often based on graph reduction. Tasks represented by task nodes in the graph are generated during run time. These tasks have to be distributed among the processors dynamically. For a prototype implementation [LK89] of such a language on a loosely coupled multiprocessor system consisting of transputers [In88], we have tested several load balancing strategies. 1 Introduction In functional programs, the subexpressions of an expression can be computed in parallel (if they are needed), since there are no side effects like modifications of global variables. This implicit parallelism can be exploited by implementations on parallel computers. One generally accepted strategy (see for example [Jo84][Bu88]) to implement functional programming languages is to use graph reduction [Wa71]. [LK89] describes a graph reduction based implementation of a functional language on a loosely coupled multiprocessor system, i.e. a system, where al...
Implementing the Evaluation Transformer Model of Reduction on Parallel Machines
, 1991
"... The evaluation transformer model of reduction generalises lazy evaluation in two ways: it can start the evaluation of expressions before their first use, and it can evaluate expressions further than weak head normal form. Moreover, the amount of evaluation required of an argument to a function may d ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The evaluation transformer model of reduction generalises lazy evaluation in two ways: it can start the evaluation of expressions before their first use, and it can evaluate expressions further than weak head normal form. Moreover, the amount of evaluation required of an argument to a function may depend on the amount of evaluation required of the function application. It is a suitable candidate model for implementing lazy functional languages on parallel machines. In this paper we explore the implementation of lazy functional languages on parallel machines, both shared and distributed memory architectures, using the evaluation transformer model of reduction. We will see that the same code can be produced for both styles of architecture, and the definition of the instruction set is virtually the same for each style. The essential difference is that a distributed memory architecture has one extra node type for non-local pointers, and instructions which involve the value of such nodes need their definitions extended to cover this new type of node. To make our presentation accessible, we base our description on a variant of the well-knon G-machine, a machine for executing lazy functional programs.
Using Strictness Information in the STG-machine
- In: Proceedings of the 4th International Workshop on the Parallel Implementation of Functional Programming Languagues
, 1995
"... The paper presents an attempt at exploiting strictness information for parallel evaluation of functional programs. A simple evaluation model, which uses strictness in a limited way, is suggested. It has been applied in a parallel version of the STG--machine; special attention has been paid to avoidi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The paper presents an attempt at exploiting strictness information for parallel evaluation of functional programs. A simple evaluation model, which uses strictness in a limited way, is suggested. It has been applied in a parallel version of the STG--machine; special attention has been paid to avoiding the creation of useless tasks. Some results from the simulation of the parallel STG--machine are provided, throwing light on the amount and granularity of parallelism. 1 Introduction Pure functional languages are in principle well suited to parallel evaluation. However, implementing them efficiently on today's highly scalable MIMD architectures is a challenging task. Only sufficiently coarse--grain parallelism is likely to be successfully utilized. Parallel implementation of nonstrict languages, in particular, presents additional problems. Care must be taken not to affect termination properties. Task creation is perhaps the most important issue in parallel implementation. Task creation d...
The TRANSPOSE Machine - A Global Implementation of a Parallel Graph Reducer
"... This paper describes a new concept for the parallel implementation of functional languages on a network of processors. The implementation uses a special variant of annotated graph reduction [3]. The main features of it are the following: We employ active waiting [6], to avoid complicated runtime dat ..."
Abstract
- Add to MetaCart
This paper describes a new concept for the parallel implementation of functional languages on a network of processors. The implementation uses a special variant of annotated graph reduction [3]. The main features of it are the following: We employ active waiting [6], to avoid complicated runtime data structures. We use a global address space, and a random distribution of the graph nodes over the local memories of the processors, in order to overcome the problems of load-balancing and scheduling. The reduction is organized in cycles during which, all annotated redices are reduced. This notion of "cycles" enables us, to restrict communication between the processors to the execution of a global permutation, defined by an array of messages M = [L LocalMessages \Theta P processors ]. This two dimensional (2D) permutation is realized by a simple and fast algorithm, that permutes all messages of M in 2L + 6L log(P ) steps, for any L sufficiently large. This algorithm actually maps any 2D-pe...
The ν-STG machine: a parallelized Spineless Tagless Graph Reduction Machine in a distributed memory architecture
, 1992
"... . This paper describes the --STG machine, a parallelized Spineless Tagless Graph Reduction (STG) machine in a distributed memory architecture. In the --STG machine, a stack for each task is distributed by allocating a context frame for each tail--call sequence on the heap. Two sparking mechanisms, B ..."
Abstract
- Add to MetaCart
. This paper describes the --STG machine, a parallelized Spineless Tagless Graph Reduction (STG) machine in a distributed memory architecture. In the --STG machine, a stack for each task is distributed by allocating a context frame for each tail--call sequence on the heap. Two sparking mechanisms, BSpark and ISpark, are supported to introduce parallelism, which are similar to those in the HDG machine. The BSpark cannot be ignored and should be synchronized with the parent task to prevent it being blocked and resumed too frequently, while ISpark may be ignored. Even though a variable is sparked by BSpark or ISpark, it may be evaluated in--line by the parent task. Creating a new task to evaluate a boxed variable according to the spark annotation is delayed until it is really needed. A message passing mechanism is used to distribute jobs and synchronize them in a machine. A lazy task creation mechanism is supported to exploit parallelism in unboxed arithmetic expression by boxing up on d...
Future Work
, 1992
"... this report relies, and who has provided many helpful comments. Bibliography ..."
Abstract
- Add to MetaCart
this report relies, and who has provided many helpful comments. Bibliography
A New Framework for Strictness Analysis Using Demand Propagation
"... This paper presents a novel approach to strictness analysis called abstract demand propagation, approach developed for the implementation of lazy functional programming languages on parallel machines. Although some work on strictness analysis using demand propagation has been done before, the presen ..."
Abstract
- Add to MetaCart
This paper presents a novel approach to strictness analysis called abstract demand propagation, approach developed for the implementation of lazy functional programming languages on parallel machines. Although some work on strictness analysis using demand propagation has been done before, the present work is original in that it gives a precise interpretation of the notions of demands and demand propagation using an exact nonstandard denotational semantics. The intuition behind this semantics is that it represents a form of inverse computation, i.e., it determines the least amount of information needed to produce at least some required (demanded) result. Viewing demand propagation as a form of inverse computation allows to establish the soundness of our non-standard semantics by formally relating it with the standard semantics. In order to define a compile-time analysis based on demand propagation, safety and termination must be ensured. This is done by defining an abstract interpreta...

