## Parallel graph reduction for divide-and-conquer applications -- Part I- program transformations (2001)

### BibTeX

@MISC{Vree01parallelgraph,

author = {Willem G. Vree and Pieter H. Hartel},

title = {Parallel graph reduction for divide-and-conquer applications -- Part I- program transformations},

year = {2001}

}

### OpenURL

### Abstract

A proposal is made to base parallel evaluation of functional programs on graph reduction combined with a form of string reduction that avoids duplication of work. Pure graph reduction poses some rather difficult problems to implement on a parallel reduction machine, but with certain restrictions, parallel evaluation becomes feasible. The restrictions manifest themselves in the class of application programs that may benefit from a speedup due to parallel evaluation. Two transformations are required to obtain a suitable version of such programs for the class of architectures considered. It is conceivable that programming tools can be developed to assist the programmer in applying the transformations, but we have not investigated such possibilities. To demonstrate the viability of the method we present four application programs with a complexity ranging from quick sort to a simulation of the tidal wav es inthe North sea.

### Citations

320 |
Bounds for certain multiprocessing anomalies
- Graham
- 1966
(Show Context)
Citation Context ...rs for which the efficiency E of the system stays above a certain cost-effective value. To present the performance data of the scheduling application, two sets of curves are drawn in figures (10) and =-=(11)-=-. In both figures the speed-up is plotted against different values of the threshold. For the scheduling application the threshold value represents a specific depth in the search tree beyond which no m... |

191 |
A new implementation technique for applicative languages
- Turner
- 1979
(Show Context)
Citation Context ...date 2 (Left 2 :(Right 2 : BorderOfRight 2)) = ‹ UpdateXleft Left 2 BorderOfRight 2 › : ‹ UpdateXright Right 2 › Figure 17 : The two phases of the updating with annotations The illustration of figure =-=(18)-=- shows the desired communication structure of Update 1 and Update 2.The dashed arrows represent the transmission of remote names, whereas the solid arrows denote communication of real data.Vree&Harte... |

171 | Semantics of Programming Languages - Tennent - 1991 |

142 |
Problem-solving methods in artificial intelligence
- Nilsson
- 1971
(Show Context)
Citation Context ... processors for which the efficiency E of the system stays above a certain cost-effective value. To present the performance data of the scheduling application, two sets of curves are drawn in figures =-=(10)-=- and (11). In both figures the speed-up is plotted against different values of the threshold. For the scheduling application the threshold value represents a specific depth in the search tree beyond w... |

122 | Efficient compilation of lazy evaluation
- Johnsson
- 1984
(Show Context)
Citation Context ...s has the advantage, that the implementation of the sandwich function does not have to search for remote names throughout the graph that represents its second argument. In the example shown in figure =-=(14)-=- the own function marks the head of the result list, which is returned by the function H. The latter reuses the value of newhead during its next application.Vree&Hartel Parallel graph reduction 16 re... |

78 |
Executing functional programs on a virtual trre L) t processors
- Burton, Sleep
- 1981
(Show Context)
Citation Context ... of QuickSort in figure (5) should contain enough work to outweigh their communication cost (job condition 3). This may be achieved byimposing a lower limit on the length of the lists m and n. Figure =-=(6)-=- shows the final version of the QuickSort program, with controlled application of the sandwich strategy as obtained by a second transformation step. We call this step the grain size transformation. Th... |

60 |
Clean: A language for functional graph rewriting
- Brus, Eekelen, et al.
- 1987
(Show Context)
Citation Context ...n (repeat 003). There are two processes involved in this reduction sequence. These have been named parent and child. The steps carried out by the child process are shown offset to the right in figure =-=(15)-=-. step parent process 1 repeat 0 0 3 2 sandwich′ G (remote 002) 6 G (“remote name” :7) 7 repeat “remote name” 14 2 8 sandwich′ G (remote “remote name” 141) 11 G (20 : 21) step child process 3 remote 0... |

43 |
Tim: a simple, lazy abstract machine to execute supercombinators
- Fairbairn, Wray
- 1987
(Show Context)
Citation Context ...rogram the matrix can be split into as many blocks as the degree of parallelism requires. We only present a partitioning of the matrix into two blocks, to concentrate on the annotation issues. Figure =-=(16)-=- shows the main recursion of the program, which is started with two partitions called Left and Right. These partitions will be updated in parallel. main Left Right n = repeat Update 1 (Left :(Right :(... |

41 | Super Combinators: A New Implementation Method for Applicative Languages - Hughes - 1982 |

37 |
SASL language manual
- Turner
- 1976
(Show Context)
Citation Context ...ons (without a capital U) in figure (20) perform the actual updating of the matrices. The function to force the transmission of the remote matrices at the end of the main recursion is shown in figure =-=(21)-=-:Vree&Hartel Parallel graph reduction 21 GetRemoteData (Left : Right) = ‹ ILeft › : ‹ IRight › Figure 21 : Retrieval ofboth matrices Both Left and Right will always be remote names during the iterati... |

37 | A parallel method for tridiagonal equations
- Wang
- 1981
(Show Context)
Citation Context ...he first equation and the variable a matches all tokens until the equals symbol (=) in the equation with the job brackets ( ‹ and › ). Similarly the variable g matches all remaining equations. Figure =-=(24)-=- shows the functions AF,LF and RF to be used for the transformation of the application programs presented in the previous sections. Together with the transformation schemes of figure (23) they generat... |

35 |
Control of parallelism in the Manchester dataflow machine
- Ruggiero, Sargeant
- 1987
(Show Context)
Citation Context ...pdateYHright. The only real data to be returned and retransmitted is the BorderOfLeft1,which travels from the “left” processor to theVree&Hartel Parallel graph reduction 20 “right” processor. Figure =-=(19)-=- shows the annotation that is necessary to obtain the desired behaviour: UpdateXleft Left 2 BorderOfRight 2 = (own Left 1 ):RightBorderOf Left 1 WHERE Left 1 = updateXleft Left 2 BorderOfRight 2 Updat... |

16 |
Evaluating functional programs on the FLAGSHIP machine
- WATSON, WATSON
- 1987
(Show Context)
Citation Context ...n figure 12) it can be observed, that it is possible to eliminate all the fill-in of a block locally, only using the bottom row ofthe next higher block. This final elimination step is shown in figure =-=(12)-=- and again all blocks can be processed in parallel.Vree&Hartel Parallel graph reduction 14 0 ... x 0 0 0 0 v v 0 0 f f 0 v 0 f f 0 0 v f f 0 0 0 x 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Figure 12 : ... |

16 | Annotations to control parallelism and reduction order in the distributed evaluation of functional languages - Burton - 1984 |

13 | Serial Combinators: Optimal Grains of Parallelism - Goldberg, Hudak - 1985 |

9 |
A Loosely-coupled Applicative Multi-processing System
- Keller, Lindstrom, et al.
- 1979
(Show Context)
Citation Context ... u u 0 0 0 0 u u u 0 0 0 0 u u u 0 0 0 0 u u u 0 0 u u u 0 0 0 u u u 0 0 0 u u u 0 0 0 u u Figure 8 : Partitioning of a tri-diagonal matrix (u ≠ 0) Each block can now beeliminated in parallel. Figure =-=(9)-=- illustrates the effect of this part of the algorithm on one block (i.c. the centre block of figure 8). u u u 0 0 0 0 u u u 0 0 0 0 u u u 0 0 0 0 u u u v v 0 0 f 0 f 0 v 0 f 0 f 0 0 v f 0 f 0 0 0 v u ... |

9 |
Performance analysis of storage management in combinator graph reduction
- Hartel
- 1988
(Show Context)
Citation Context ...ted under the normal lazy strategy or with the sandwich strategy. Using the sandwich strategy, the net execution time of the program is less, due to parallel evaluation of jobs. The diagram of figure =-=(5)-=- schematises this difference. The horizontal line segments represent the number of reduction steps required by the different branches in the evaluation. bsteps s 1 steps esteps s 2 steps Figure 5 : Ti... |

8 |
Distributed Execution of functional programs using Serial Combinators
- Hudak, Goldberg
- 1985
(Show Context)
Citation Context ...lly sized blocks and to try elimination of these blocks in parallel. The two edge blocks (top left and bottom right) are extended by a zero column, to obtain the same size as the other blocks. Figure =-=(8)-=- shows how a12 × 12 matrix can be split into three blocks.Vree&Hartel Parallel graph reduction 12 u u 0 0 0 u u u 0 0 0 u u u 0 0 0 u u u u u u 0 0 0 0 u u u 0 0 0 0 u u u 0 0 0 0 u u u 0 0 u u u 0 0... |

8 | The grain size of parallel computations in a functional program - Vree - 1987 |

7 |
Simulated performance of a reduction based multiprocessor
- Keller, Lin
- 1984
(Show Context)
Citation Context ...d to the first and fifth columns of the partition. The reason for this confinement becomes apparent when two adjacent blocks that have been processed are shown together, like blocks A and B in figure =-=(10)-=-.Vree&Hartel Parallel graph reduction 13 A: v v 0 0 f 0 f 0 v 0 f 0 f 0 0 v f 0 f 0 0 0 v u v v 0 0 f 0 B: f 0 v 0 f 0 f 0 0 v f 0 f 0 0 0 v u f 0 0 0 w 0 0 0 f 0 Figure 10 : Elimination at the borde... |

6 |
Cooperating Reduction Machines
- Kluge
- 1983
(Show Context)
Citation Context ...M is needed in a context C[M] if and only if M is reduced to normal form when C[M] is reduced to normal form.Vree&Hartel Parallel graph reduction 4 becomes: i∈1..n ∀ ⎛ ⎜ci < ⎝ n Σ s k k=1, k≠i ⎞ ⎟ ⎠ =-=(1)-=- What we want to prove is that the longest job (communication included) takes less time than all jobs in sequence (without communication), i.e.: n Σ sk > n max (sk + ck) k=1 k=1 (2) From (1) it follow... |

6 |
ALICE: A Multiple-processor Reduction Machine for the Parallel Evaluation of Applicative Languages. in FPCA'81
- Darlington, Reeve
- 1981
(Show Context)
Citation Context ...ld allow for an optimal processor utilisation. Using a free mixture of conventional mathematical notation and SASL syntax, the essential part of the program with the job annotation is shown in figure =-=(7)-=-.Vree&Hartel Parallel graph reduction 11 fft 1 r → d = → d fft n r → d = ‹ fft halfn halfr → u › ++‹ fft halfn (halfr + 128) → v › WHERE halfn = n /2 halfr = r /2 → u = → x + → z → v = → x − → z → x ... |

6 |
Transputer based experiments with ZAPP architecture
- McBurney, Sleep
- 1987
(Show Context)
Citation Context ... Update1 to its own output, one can see that the results of UpdateYHleft and UpdateYHright are also redirected without any modification into the next iteration of UpdateXleft and UpdateXright. Figure =-=(20)-=- shows the annotation that is necessary to retain the matrices in their respective processors and to return the actual data of the border of Right1: UpdateYHleft Left 2 = own (updateYHleft Left 2) Upd... |

6 |
Fast Fourier transform hardware implementations–an overview,” Audio and Electroacoustics
- Bergland
- 1969
(Show Context)
Citation Context ...RUE whenever the grain size of the jobs is above an application dependent threshold. The transformation functions JL, GS and SQ are defined independently of the application by the equations of figure =-=(23)-=-. The job lifting function (JL) transforms a given function definition into a version where the two jobs are lifted from a general expression into a single function application. JL also generates a se... |

5 |
A Network of Microprocessors to Execute Reduction Languages
- Mago
- 1979
(Show Context)
Citation Context ...s k k=1, k≠i ⎞ ⎟ ⎠ (1) What we want to prove is that the longest job (communication included) takes less time than all jobs in sequence (without communication), i.e.: n Σ sk > n max (sk + ck) k=1 k=1 =-=(2)-=- From (1) it follows that: ∀ i∈1..n ⎛ ⎝ ci + si < n Σ s k k=1 ⎞ and therefore (2). ⎠ The intuitive version of job condition (3), namely ∀ ci < si is not sufficient to proof (2). Counter example: two i... |

5 | der Vorst. Data transport in Wang's partition method - Michielse, van - 1988 |

3 |
Garbage collection of linked structures
- Cohen
- 1981
(Show Context)
Citation Context ...with stack overflow properly. 8 2.3. Cooperation of functional units Having exposed the functionality of the components in the architecture, we can now show with an example how they cooperate. Figure =-=(4)-=- represents a configuration with three processing elements dedicated to reduction and the conductor. Graphs reside in overlapping stores. The life cycle of a single job is traced by describing, in chr... |

3 | Performance of lazy combinator graph reduction,” PRM project internal report - Hartel - 1989 |

1 |
A recursive computer for VLSI,” Computer architecture news 10(3
- Treleaven, Hopkins
- 1982
(Show Context)
Citation Context ...must be annotated. To achieve this we use angular brackets ( ‹ and › ), which obey the same syntactic rules as the normal parentheses. An expression between matching angular brackets is a job. Figure =-=(4)-=- shows the version of the QuickSort function after annotation with job brackets. The annotation has to be provided by the programmer.Vree&Hartel Parallel graph reduction 9 QuickSort ()= () QuickSort ... |

1 |
A new parallel graph reduction model and its machine architecture,” pp. 1-24 in Programming of future generation computers
- Amamiya
- 1986
(Show Context)
Citation Context ...ires two steps. The first step, which we call job lifting, recognises expressions between job brackets. Job lifting generates an auxiliary function G that satisfies the sandwich conditions. In figure =-=(5)-=- job lifting has replaced the body of QuickSort by a sandwich expression of G. QuickSort ()= () QuickSort (a : x) = sandwich′ G (QuickSort m) (QuickSort n) WHERE GPQ = P ++(a : Q) m : n = Split a x ()... |

1 |
Using Futurebus in a fifth-generation computer,” Microprocessors and microsystems 10(2
- Jones
- 1986
(Show Context)
Citation Context ...he result in figure 10). If the same elimination is performed on all pairs of border rows of adjacent blocks, the resulting bottom rows of all blocks together constitute a tri-diagonal matrix. Figure =-=(11)-=- shows this subsystem for the example matrix and the result of the elimination. This can be achieved either directly with Gauss elimination or if the system is large enough by recursive application of... |

1 |
Directions in functional programming research,” pp. 220-249 in SERC conf. ondistributed computing systems programme
- Jones
- 1984
(Show Context)
Citation Context ...tion 14 0 ... x 0 0 0 0 v v 0 0 f f 0 v 0 f f 0 0 v f f 0 0 0 x 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Figure 12 : Final elimination The SASL program that implements the algorithm is shown in figure =-=(13)-=-. Partition matrix = ParMap SecondElimination matrix 2 WHERE matrix 2 = SequentialPart matrix 1 matrix 1 = ParMap FirstElimination matrix ParMap f (a :()) = ( f a):() ParMap f (a : x) = ‹ f a › : ‹ Pa... |

1 |
Functional programs as executable specifications,” pp. 29-54 in Mathematical logic and programming languages
- Turner
- 1984
(Show Context)
Citation Context ... (see below). Therefore a specialVree&Hartel Parallel graph reduction 18 function GetRemoteData is provided, to force the transmission of the actual matrices at the end of the main recursion. Figure =-=(17)-=- presents the function Update1.The process of the updating itself is split into two phases, after each of which communication of one border of the matrices takes place. The first phase updates the x-v... |

1 |
Parallel graph reduction for communicating sequential processes,” PRM project internal report
- Vree
- 1988
(Show Context)
Citation Context ...ed to be used frequently and the generation of parallel jobs in the program only depends on geographical data, which are not likely to change often. 4.1. Scheduling of jobs The illustration of figure =-=(7)-=- shows two jobs ( fork1 and fork2)that have executed a sandwich primitive and three jobs that remain sequential (mid1, mid2 and mid3). The horizontal axis represents the elapsed time as measured in re... |

1 | Distributed graph reduction from first principles,” pp. 1-14 in Implementation of functional languages - Keller - 1985 |

1 |
An on-the-fly scheduling algorithm for an experimental parallel reduction machine,” PRM project internal report, Dept
- Hofman
- 1988
(Show Context)
Citation Context ...epresent the numbers of transported nodes in respectively the i − th job and the i − th result. The communication cost pertaining to the i − th job/result is: j i r i ci = ⎡ ⎤ ⎢ T ⎥ + ⎡ ⎢ T ⎤ + 4 C ⎥ =-=(3)-=- The gross number of reduction steps of the whole family of m jobs is defined as: m Rg = b + max (ci + si) + e + 2 C i=1 (4) The ratio S = R s/R g gives the maximum speedup that can be attained. If th... |

1 |
The G-machine: {A} fast, graph-reduction evaluator,” pp. 400-413 in 2nd Functional programming languages and computer architecture
- Kieburtz
- 1985
(Show Context)
Citation Context ...inimises the total execution time. elapsed time t branch fork 2 mid 3 join 2 processor 1 fork 1 mid 1 mid 2 join 1 processor 2 Figure 8 : An optimal schedule with two processors As an example, figure =-=(8)-=- illustrates a schedule of the jobs involved in the application of figure (7) on a two processor system. The dashed lines represent the time periods that a processor is idle. When the job fork2 wishes... |

1 | Adistributed real-time operating system,” Software−practice and experience 16(5 - Tuijnman, Hertzberger - 1986 |