## A Tutorial Implementation of the Diffusion Algorithmic Skeleton with the BSMLlib Library (2004)

Citations: | 1 - 0 self |

### BibTeX

@MISC{Loulergue04atutorial,

author = {Frederic Loulergue and Frédéric Loulergue and Zhenjiang Hu and Zhenjiang Hu and Kazuhiko Kakehi and Kazuhiko Kakehi},

title = {A Tutorial Implementation of the Diffusion Algorithmic Skeleton with the BSMLlib Library},

year = {2004}

}

### OpenURL

### Abstract

Skeleton programming enables programmers to build parallel programs easier by providing efficient ready-made parallel algorithms. The diffusion skeleton was proposed (associated with a method for program derivation) to abstract a good combination of primitive skeletons, such as map, parallel reduction and parallel prefix sum (scan).

### Citations

1130 |
A Bridging Model for Parallel Computation
- Valiant
- 1990
(Show Context)
Citation Context ... Appendix B explains the installation and basic use of the BSMLLIB library. 2 Functional Bulk Synchronous Parallelism 2.1 The Bulk Synchronous Parallel Model The Bulk Synchronous Parallel (BSP) model =-=[35, 29, 33]-=- describes: an abstract parallel computer, a model of execution and a cost model. A BSP computer has three components: a homogeneous set of processor-memory pairs, a communication network allowing int... |

624 |
MPI: The Complete Reference
- Snir, Otto, et al.
- 1996
(Show Context)
Citation Context ...itional. There is currently no full implementation of BSML but there is a partial implementation as a library. The BSMLLIB library [1] implements the BSML primitives using Objective Caml [20] and MPI =-=[34]-=-. BSMLLIB can be taught to BSc. students due to the small number of basic operations 1 . There are additional modules which provide several usual parallel algorithms. They constitute what is called th... |

215 |
An introduction to the theory of lists
- Bird
- 1987
(Show Context)
Citation Context ... ’a) par *) let totex vv = parfun (compose noSome) (put(parfun (fun v dst->Some v) vv)) 3 The Diffusion Parallel Skeleton and the Diffusion Theorem We will use the BMF data parallel programming mode=-=l [4, 32]-=- to describe the diffusion skeleton. We choose BMF because it can provide us a concise way to describe both programs and transformation of programs. In this section we present the BMF notation and the... |

165 | Direct bulk-synchronous parallel algorithms
- Gerbessiotis, Valiant
- 1994
(Show Context)
Citation Context ...rget architecture. Bulk Synchronous Parallel ML or BSML is our extension of ML for programming direct-mode parallel BSP algorithms as functional programs. A BSP algorithm is said to be in direct mode =-=[14] -=-when its physical process structure is made explicit. Such algorithms offer predictable and scalable performance and BSML expresses them with a small set of primitives taken from the confluent BSλ-ca... |

83 | Universal computing
- McColl
(Show Context)
Citation Context ... Appendix B explains the installation and basic use of the BSMLLIB library. 2 Functional Bulk Synchronous Parallelism 2.1 The Bulk Synchronous Parallel Model The Bulk Synchronous Parallel (BSP) model =-=[35, 29, 33]-=- describes: an abstract parallel computer, a model of execution and a cost model. A BSP computer has three components: a homogeneous set of processor-memory pairs, a communication network allowing int... |

70 | Scientific computing on bulk synchronous parallel architectures
- Bisseling, McColl
- 1994
(Show Context)
Citation Context ...ation. Bulk Synchronous Parallelism (and the Coarse-Grained Multicomputer, CGM, which can be seen as a special case of the BSP model) is used for a large variety of applications: scientific computing =-=[5, 18], -=-genetic algorithms [6] and genetic programming [9], neural networks [31], parallel databases [3], constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent c... |

25 |
A Calculus of
- Loulergue, Hains, et al.
- 2000
(Show Context)
Citation Context ...raction and yet, allows portable and predictable performance on a wide variety of architectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms=-= [27, 21, 24]-=-, and to a library of bulk synchronous primitives for the Objective Caml [20, 7, 30] language which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is ... |

22 |
portability and predictability: The BSP approach to parallel programming
- Scalability
- 1996
(Show Context)
Citation Context ...ode (in this context, cost means the estimate of parallel execution time), we use explicit processes corresponding to the processors of the parallel machine. Bulk Synchronous Parallel (BSP) computing =-=[28, 33]-=- is a parallel programming model which uses explicit processes, offers a high degree of abstraction and yet, allows portable and predictable performance on a wide variety of architectures. An operatio... |

19 | Implementation of a Functional Bulk Synchronous Parallel Programming Library
- Loulergue
- 2002
(Show Context)
Citation Context ... BSP algorithms [27, 21, 24], and to a library of bulk synchronous primitives for the Objective Caml [20, 7, 30] language which is sufficiently expressive and allows the prediction of execution times =-=[16, 23]. Th-=-is framework is a good trade-off for parallel programming because: • the defined calculus is a confluent calculus so: 2s– one can design purely functional parallel languages from it. Without side-... |

15 |
Développement d’applications avec Objective Caml. O’Reilly
- Chailloux, Manoury, et al.
- 2000
(Show Context)
Citation Context ...hitectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms [27, 21, 24], and to a library of bulk synchronous primitives for the Objective Caml=-= [20, 7, 30] l-=-anguage which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is a good trade-off for parallel programming because: • the defined calculus is a confl... |

14 |
Unravellling the OCaml Language
- Using
- 2002
(Show Context)
Citation Context ...hitectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms [27, 21, 24], and to a library of bulk synchronous primitives for the Objective Caml=-= [20, 7, 30] l-=-anguage which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is a good trade-off for parallel programming because: • the defined calculus is a confl... |

12 | Speeding up genetic programming: A parallel BSP implementation
- Dracopoulos, Kent
- 1996
(Show Context)
Citation Context ...ned Multicomputer, CGM, which can be seen as a special case of the BSP model) is used for a large variety of applications: scientific computing [5, 18], genetic algorithms [6] and genetic programming =-=[9], -=-neural networks [31], parallel databases [3], constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent conference in the field, the ACM Symposium on Paralle... |

12 | A Polymorphic Type System for Bulk Synchronous Parallel ML
- Gava, Loulergue
- 2003
(Show Context)
Citation Context ...stract polymorphic type ’a par represents the type of p-wide parallel vectors of objects of type ’a, one per process. The nesting of par types is prohibited. Our type system enforces this restrict=-=ion [13, 12]-=-. This is very different from SPMD programming (Single Program Multiple Data) where the programmer must use a sequential language and a communication library (like MPI [34]). A parallel program is the... |

11 |
Foundations of Parallel Programming. Number 6
- Skillicorn
- 1994
(Show Context)
Citation Context ... ’a) par *) let totex vv = parfun (compose noSome) (put(parfun (fun v dst->Some v) vv)) 3 The Diffusion Parallel Skeleton and the Diffusion Theorem We will use the BMF data parallel programming mode=-=l [4, 32]-=- to describe the diffusion skeleton. We choose BMF because it can provide us a concise way to describe both programs and transformation of programs. In this section we present the BMF notation and the... |

10 |
Distributed evaluation of functional BSP programs
- Loulergue
(Show Context)
Citation Context ...phases are programmed withmkpar and with: apply: (’a -> ’b) par -> ’a par -> ’b par apply (mkpar f) (mkpar e) stores (f i) (e i) on process i. Neither the implementation of BSMLLIB, nor its se=-=mantics [22]-=- prescribe a synchronization barrier between two successive uses ofapply. Example 2 Let consider the following expression: let vf = mkpar(fun i->(+) i) and vv = mkpar(fun i->2*i+1) in apply vf vv The ... |

9 |
A parallel genetic algorithm based on the BSP model
- Braud, Vrain
- 1999
(Show Context)
Citation Context ...llelism (and the Coarse-Grained Multicomputer, CGM, which can be seen as a special case of the BSP model) is used for a large variety of applications: scientific computing [5, 18], genetic algorithms =-=[6] a-=-nd genetic programming [9], neural networks [31], parallel databases [3], constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent conference in the field, ... |

9 |
A system for the high-level parallelization and cooperation of constraint solvers
- Granvilliers, Hains, et al.
- 1998
(Show Context)
Citation Context ... model) is used for a large variety of applications: scientific computing [5, 18], genetic algorithms [6] and genetic programming [9], neural networks [31], parallel databases [3], constraint solvers =-=[15], -=-etc. It is to notice that “A comparison of the proceedings of the eminent conference in the field, the ACM Symposium on Parallel Algorithms and Architectures, between the late eighties and the time ... |

9 | Diffusion: Calculating Efficient Parallel Programs
- Hu, Takeichi, et al.
(Show Context)
Citation Context ...apt to become a process with much a trial and error. To overcome this problem, we proposed a parallel skeleton, namely diffusion skeleton diff [2]. This skeleton is derived from the Diffusion Theorem =-=[19] and-=- is defined in terms of primitive skeletons map, reduce and scan. It abstracts a ‘good’ combination of parallel primitives, and thanks to the underlying theorem, recursive functions defined natura... |

9 | BSλp: Functional BSP Programs on Enumerated Vectors
- Loulergue
- 2000
(Show Context)
Citation Context ...raction and yet, allows portable and predictable performance on a wide variety of architectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms=-= [27, 21, 24]-=-, and to a library of bulk synchronous primitives for the Objective Caml [20, 7, 30] language which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is ... |

9 | Parallel Juxtaposition for Bulk Synchronous Parallel ML
- Loulergue
- 2003
(Show Context)
Citation Context ...number of processes of the parallel machine. The value of this variable does not change during execution (for “flat” programming, this is not true if a parallel juxtaposition is added to the langu=-=age [25])-=-. It also offers thebsp g andbsp l functions which both have typeunit->float. In BSMLLIB, these parameters are read from the˜/.bsmllibrc file (which should contain lines of the form: p,g,l for exampl... |

9 | Parallel Superposition for Bulk Synchronous Parallel ML
- Loulergue
- 2003
(Show Context)
Citation Context ...ave not been called before. The core library is contained in the module Bsmllib. All these primitives are used for “flat” programming. There also exist three other primitives for parallel composit=-=ion [25, 26]-=-. These primitives are not yet available in the current distribution. When one start to program using the BSMLLIB library, it appears that some other functions ease the programming. These functions ar... |

7 |
Special issue on coarse-grained parallel algorithms
- Dehne
- 1999
(Show Context)
Citation Context ...hties and the time from the mid nineties to today reveals a startling change in research focus. Today, the majority of research in parallel algorithms is within the coarse-grained, BSP style, domain��=-=� [8]-=-. 2.2 The Bulk Synchronous Parallel ML Library There is currently no implementation of a full Bulk Synchronous Parallel ML language but rather a partial implementation: a library for Objective Caml. T... |

7 | Using the BSP cost model to optimise parallel neural network training. Future Generation Computer Systems
- Rogers, Skillicorn
- 1998
(Show Context)
Citation Context ...M, which can be seen as a special case of the BSP model) is used for a large variety of applications: scientific computing [5, 18], genetic algorithms [6] and genetic programming [9], neural networks =-=[31], -=-parallel databases [3], constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent conference in the field, the ACM Symposium on Parallel Algorithms and Archi... |

6 | An efficient scalable parallel view maintenance algorithm for shared nothing multi-processor machines
- Bamha, Bentayeb, et al.
(Show Context)
Citation Context ... special case of the BSP model) is used for a large variety of applications: scientific computing [5, 18], genetic algorithms [6] and genetic programming [9], neural networks [31], parallel databases =-=[3], -=-constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent conference in the field, the ACM Symposium on Parallel Algorithms and Architectures, between the la... |

6 | Designing a BSP version of ScaLAPACK
- Horvitz, Bisseling
- 1999
(Show Context)
Citation Context ...ation. Bulk Synchronous Parallelism (and the Coarse-Grained Multicomputer, CGM, which can be seen as a special case of the BSP model) is used for a large variety of applications: scientific computing =-=[5, 18], -=-genetic algorithms [6] and genetic programming [9], neural networks [31], parallel databases [3], constraint solvers [15], etc. It is to notice that “A comparison of the proceedings of the eminent c... |

4 |
Synthèse de types pour Bulk Synchronous Parallel ML
- Gava, Loulergue
- 2003
(Show Context)
Citation Context ...stract polymorphic type ’a par represents the type of p-wide parallel vectors of objects of type ’a, one per process. The nesting of par types is prohibited. Our type system enforces this restrict=-=ion [13, 12]-=-. This is very different from SPMD programming (Single Program Multiple Data) where the programmer must use a sequential language and a communication library (like MPI [34]). A parallel program is the... |

2 |
Diff: A Powerfull Parallel Skeleton
- Adachi, Iwasaki, et al.
- 2000
(Show Context)
Citation Context ...itable way. It is not an easy task, since programming is apt to become a process with much a trial and error. To overcome this problem, we proposed a parallel skeleton, namely diffusion skeleton diff =-=[2]. Th-=-is skeleton is derived from the Diffusion Theorem [19] and is defined in terms of primitive skeletons map, reduce and scan. It abstracts a ‘good’ combination of parallel primitives, and thanks to ... |

1 |
Une bibliothèque certifi ée de programmes fonctionnels BSP
- Gava
- 2004
(Show Context)
Citation Context ...or parallel programming because: • the defined calculus is a confluent calculus so: 2s– one can design purely functional parallel languages from it. Without side-effects, programs are easier to pr=-=ove [10, 11], an-=-d to re-use (the semantics is compositional) – we can choose any evaluation strategy for the language. An eager language allows good performances. • this calculus is based on BSP operations, so pr... |

1 |
The Objective Caml System 3.07, 2003. web pages at www.ocaml.org
- Leroy
(Show Context)
Citation Context ...hitectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms [27, 21, 24], and to a library of bulk synchronous primitives for the Objective Caml=-= [20, 7, 30] l-=-anguage which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is a good trade-off for parallel programming because: • the defined calculus is a confl... |

1 |
A Calculus of Functional BSP Programs with Explicit Substitution
- Loulergue
- 2003
(Show Context)
Citation Context ...raction and yet, allows portable and predictable performance on a wide variety of architectures. An operational approach has led to a BSP λ-calculus that is confluent and universal for BSP algorithms=-= [27, 21, 24]-=-, and to a library of bulk synchronous primitives for the Objective Caml [20, 7, 30] language which is sufficiently expressive and allows the prediction of execution times [16, 23]. This framework is ... |