## A New Parallel Skeleton for General Accumulative Computations (2004)

Venue: | International Journal of Parallel Programming |

Citations: | 2 - 2 self |

### BibTeX

@ARTICLE{Iwasaki04anew,

author = {Hideya Iwasaki and Zhenjiang Hu},

title = {A New Parallel Skeleton for General Accumulative Computations},

journal = {International Journal of Parallel Programming},

year = {2004},

volume = {32},

pages = {389--414}

}

### OpenURL

### Abstract

this paper, we propose a powerful and general parallel skeleton called accumulate and describe its efficientimplementation in C++ with MPI (Message Passing Interface) (18) as a solution to the above problems. Unlike the approaches that apply such optimizations as loop restructuring to the target program, our approach provides a general recursive computation with accumulation as a library function (skeleton) with an optimized implementation. We are based on the data parallel programming model of BMF, which provides us with a concise way to describe and manipulate parallel programs. The main advantages of accumulate can be summarized as follows

### Citations

946 |
Performance Fortran Forum. High Performance Fortran language specification
- High
- 1994
(Show Context)
Citation Context ...uses several workers. Data parallel skeletons capture the simultaneous computations on the data partitioned among processors. Examples of this kind of skeleton are forall in High Performance Fortran, =-=(15) a-=-pply-tocall in NESL, (7, 8) and a fixedset of higher order functions such as map, reduce, scan, and zip in the Bird–Meertens Formalism (16, 17) (BMF for short). Skeletal programming is not restricte... |

301 | Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire
- MEIJER, FOKKINGA, et al.
- 1991
(Show Context)
Citation Context ... lists with a single accumulative parameter, it is general enough to capture many algorithms. (23) As a matter of fact, when the accumulative parameter is unnecessary, h is the so-called catamorphism =-=(24, 22)-=-s6 Iwasaki and Hu or foldr, one of the standard functions provided in most functional language systems. Many useful functions can be expressed in the form of a catamorphism. (21) As an example, consid... |

216 |
An introduction to the theory of lists
- Bird
- 1987
(Show Context)
Citation Context ...s kind of skeleton are forall in High Performance Fortran, (15) apply-tocall in NESL, (7, 8) and a fixedset of higher order functions such as map, reduce, scan, and zip in the Bird–Meertens Formalis=-=m (16, 17)-=- (BMF for short). Skeletal programming is not restricted to a specificapplication area; (14) it provides general patterns of parallel programming to help programmers write higher-level and structured ... |

215 |
Introduction to Functional Programming using Haskell
- Bird
- 1998
(Show Context)
Citation Context ... 2.1. Notations We use the functional notation to describe the definitionsof skeletons and programs because of its conciseness and clarity. Those who are familiar with the functional language Haskell =-=(20) -=-should have no problem understanding our notation. Function application is denoted by a space and the argument is written without brackets. Thus, f a means fsa¡ . Functions are curried, and applicati... |

195 | A short cut to deforestation
- Gill, Launchbury, et al.
- 1993
(Show Context)
Citation Context ...lled catamorphism (24, 22)s6 Iwasaki and Hu or foldr, one of the standard functions provided in most functional language systems. Many useful functions can be expressed in the form of a catamorphism. =-=(21) As an e-=-xample, consider the elimination of all smaller elements from a list. An element is said to be smaller if it is less than a previous element in the list. For example, for the list � 1� 4� 2� 3... |

146 | Parallel Programming Using Skeleton Functions
- Darlington, Field, et al.
- 1993
(Show Context)
Citation Context ...and thus ¢ ) can be carried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 1=-=00 [1, 2]-=- [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 5... |

136 | NESL: A nested data-parallel language
- Blelloch
- 1992
(Show Context)
Citation Context ...arried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, 2] [3, 4] [5, 6=-=] [7, 8]-=- PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 22 30 [2, 6] [... |

65 |
P 3 L: A structured high level programming language and its structured support. Concurrency: Practice and Experience
- Bacci, Danelutto, et al.
- 1995
(Show Context)
Citation Context ...an be carried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, 2] [3, 4=-=] [5, 6]-=- [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 22 30 [... |

59 |
A skeleton library
- Kuchen
- 2002
(Show Context)
Citation Context ...systems have ways to describe data distribution. For example, P3L (5, 6) provides collective operations such as scatter for data parallel computations; Skil (9, 10) and the skeleton library by Kuchen =-=(11) have distributed data-=- structures. While we directly use MPI primitives such as � ���£¢ � ����� for data distribution, it would be better to provide abstracted libraries (or skeletons), which is left ... |

55 |
Algorithmic skeletons: a structured approach to the management of parallel computation
- Cole
- 1988
(Show Context)
Citation Context ...and thus ¢ ) can be carried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 1=-=00 [1, 2]-=- [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 5... |

47 | Deriving structural hylomorphisms from recursive definitions
- Hu, Iwasaki, et al.
- 1996
(Show Context)
Citation Context ...ulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14=-=] [10, 22]-=- [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 22 30 [2, 6] [1, 2] [6, 14] [3, 4] [10, 22] [5, 6] [14, 30] [7, 8] [2, 6] [6, 14] [... |

41 | The Bird-Meertens Formalism as a Parallel Model
- Skillicorn
- 1993
(Show Context)
Citation Context ...s kind of skeleton are forall in High Performance Fortran, (15) apply-tocall in NESL, (7, 8) and a fixedset of higher order functions such as map, reduce, scan, and zip in the Bird–Meertens Formalis=-=m (16, 17)-=- (BMF for short). Skeletal programming is not restricted to a specificapplication area; (14) it provides general patterns of parallel programming to help programmers write higher-level and structured ... |

37 | Skil: An Imperative Language with Algorithmic Skeletons for Efficient Distributed Programming
- Botorog, Kuchen
- 1996
(Show Context)
Citation Context ...entation of accumulate more portable and practical. The accumulate skeleton is a polymorphic function that can accept various data types without any special syntax. This is in sharp contrast to Skil, =-=(9, 10)-=- in which enhanced syntax for describing polymorphism and functional features is introduced into a C-based language. The organization of this paper is as follows. We review existing parallel skeletons... |

33 | Parallel skeletons for structured composition
- Darlington, Guo, et al.
- 1995
(Show Context)
Citation Context ...s ¢ ) can be carried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, =-=2] [3, 4]-=- [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 ... |

23 |
Scans as primitive operations
- Blelloch
- 1989
(Show Context)
Citation Context ...arried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, 2] [3, 4] [5, 6=-=] [7, 8]-=- PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 22 30 [2, 6] [... |

18 | Parallelization via context preservation
- Chin, Takano, et al.
- 1998
(Show Context)
Citation Context ... element of an input list in the recursive definitionof interest. The only possible difficultyis to findsuitable associative operators ¢ and ¤ together with q. The context preservation transformatio=-=n (26)-=- may provide a systematic way to deal with this difficulty, but a detailed discussion on this point is beyond the scope of this paper. To see how powerful and practical the new skeleton is for describ... |

14 | An accumulative parallel skeleton for all
- Hu, Iwasaki, et al.
- 2002
(Show Context)
Citation Context ...umulative parameter. Since function h of the above form represents the most natural recursive definitionon lists with a single accumulative parameter, it is general enough to capture many algorithms. =-=(23)-=- As a matter of fact, when the accumulative parameter is unnecessary, h is the so-called catamorphism (24, 22)s6 Iwasaki and Hu or foldr, one of the standard functions provided in most functional lang... |

11 | M.: The Integration of Task and Data Parallel Skeletons - Kuchen, Cole - 2002 |

9 | Diffusion: Calculating Efficient Parallel Programs
- Hu, Takeichi, et al.
(Show Context)
Citation Context ...rameter. It provides a more natural way of describing algorithms with complicated dependencies than existing skeletons, like scan. (7) In fact, since accumulate is derived from the diffusion theorem, =-=(19) m-=-ost useful recursive algorithms can be captured using this skeleton. – The accumulate skeleton has an architecture-independent and efficient parallel implementation. It effectively eliminates both m... |

5 |
Pelagatti Skeletons for Data Parallelism
- Danelutto, Pasqualetti, et al.
- 1997
(Show Context)
Citation Context ...an be carried out in constant time. To sum up, our implementation of accumulate uses OsN ¡ PslogP¡ parallel time.sA New Parallel Skeleton for General Accumulative Computations 15 72 100 [1, 2] [3, 4=-=] [5, 6]-=- [7, 8] PID 0 PID 1 PID 2 PID 3 100 72 20 6 14 22 30 [2, 6] [6, 14] [10, 22] [14, 30] [1, 2] [3, 4] [5, 6] [7, 8] PID 0 PID 1 PID 2 PID 3 100+20 100 120 PID 0 PID 1 PID 2 PID 3 (a) (b) 52 6 14 22 30 [... |

2 | Z.: Diff: A powerful parallel skeleton
- Adachi, Iwasaki, et al.
- 2000
(Show Context)
Citation Context ...termined by g, p, q, ¢ , and ¤ , we can parameterize them and use the special notation � � g�sp� ¢�¡ �sq�©¤�¡£� � for accumulate.s10 Iwasaki and Hu The accumulate skeleton =-=was previously called diff, (25) but it h-=-as been renamed so as to reflect its characteristic feature of describing data dependencies. It is a direct consequence of the diffusion theorem that accumulate can be rewritten as �s¢�¡s� ¥s... |

1 |
Diffusion after Fusion — Deriving Efficient Parallel Algorithms
- Shirasawa, Hu, et al.
- 2001
(Show Context)
Citation Context ...s � True� True� False� False� True� False� , where True indicates that the corresponding point is visible. We simplifiedthe problem to be to findthe number of visible points. The functio=-=n lineofsight (27) to solve this simplifiedline--=-of-sight problem can be described in terms of accumulate: lineofsight xs c ¥ � � g�sp�s�¡ �sq�¡s¡£� � xs c where g c ¥ 0 psx� c¡ ¥ if c ¡ angle x then 1 else 0 q x ¥ angl... |

1 |
A PC Cluster System Employing the
- Hyoudou, Ozaki, et al.
- 2002
(Show Context)
Citation Context ...t holds the maximum value of angle for the points investigated. The C++ program we used was based on the above form of lineofsight. The parallel environment was a PC cluster system called FireCluster =-=(28) -=-with eight processors connected by an IEEE 1394 serial bus. The IEEE 1394 is becoming a standard for connecting PCs because it provides high-speed (400 Mbps) and low-latency (7.5 μs) communications. ... |