## Towards Efficient and Portability: Programming with the BSP Model (1996)

Venue: | In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures |

Citations: | 35 - 3 self |

### BibTeX

@INPROCEEDINGS{Goudreau96towardsefficient,

author = {Mark Goudreau and Kevin Lang and Satish Rao and Torsten Suel and Thanasis Tsantilas},

title = {Towards Efficient and Portability: Programming with the BSP Model},

booktitle = {In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures},

year = {1996},

pages = {1--12}

}

### Years of Citing Articles

### OpenURL

### Abstract

The Bulk-Synchronous Parallel (BSP) model was proposed by Valiant as a model for general-purpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of the BSP model have been presented, the degree to which the model can be efficiently utilized on existing parallel machines remains unclear. To explore this question, we implemented a small library of BSP functions, called the Green BSP library, on several parallel platforms. We also created a number of parallel applications based on this library. Here, we report on the performance of six of these applications on three different parallel platforms. Our preliminary results suggest that the BSP model can be used to develop efficient and portable programs for a range of machines and applications. 1

### Citations

1184 |
A bridging model for parallel computation
- Valiant
- 1990
(Show Context)
Citation Context ...e software, parallel programs developed on one machine often require major modifications before they can be efficiently employed on other parallel machines. The Bulk-Synchronous Parallel or BSP model =-=[34] was propo-=-sed by Valiant as a "bridging model" that provides a stan1 Department of Computer Science,University of Central Florida, Orlando, FL 32816--2362. Email: goudreau@cs.ucf.edu. 2 NEC Research I... |

1138 |
Using MPI: Portable Parallel Programming with the Message-Passing Interface
- Lusk, Skjellum
- 1994
(Show Context)
Citation Context ... only contention arising at the processor-network interface. A somewhat different approach to portable parallel programming is based on standardized message-passing libraries such as PVM [12] and MPI =-=[16]-=-. While these libraries provide a common set of functions on a variety of parallel machines, they do not offer any cost function (in the strict sense) that could guide the programmer in the design of ... |

994 | Active Messages: A Mechanism for Integrated Communication and Computation
- Eicken, Culler, et al.
- 1992
(Show Context)
Citation Context ...ce of point-to-pointmessages with three parameters representing software overhead, network latency, and communication bandwidth. The LogP model has been used as a performance model for active messages=-=[36]-=- and the Split-C language [10], where it has been applied to the analysis of several algorithms. Other related models are the Postal Model [2], the Atomic Model [22], and several models for end-point ... |

726 |
SPLASH: Stanford Parallel Applications for Shared-Memory
- Singh, Weber, et al.
- 1992
(Show Context)
Citation Context ...mented on a number of parallel platforms. The applications are: ffl an N-body simulation using the Barnes-Hut algorithm, ffl an ocean eddy simulation program adapted from the SPLASH application suite =-=[31]-=-, ffl a minimum spanning tree algorithm, ffl a shortest paths algorithm, ffl a multiple shortest paths algorithm, and ffl a dense matrix multiplication algorithm. In all of our applications, we used o... |

667 |
PVM: Parallel Virtual Machine, A User's Guide and Tutorial for Networked Parallel Computing
- Geist
- 1994
(Show Context)
Citation Context ...ork, with the only contention arising at the processor-network interface. A somewhat different approach to portable parallel programming is based on standardized message-passing libraries such as PVM =-=[12]-=- and MPI [16]. While these libraries provide a common set of functions on a variety of parallel machines, they do not offer any cost function (in the strict sense) that could guide the programmer in t... |

505 |
Introduction to Parallel Computing: Design and Analysis of Algorithms
- Kumar, Grama, et al.
- 1994
(Show Context)
Citation Context ... the prospect of distributed data applications on networks of workstations. 3.6 Matrix Multiplication This program multiplies two dense n \Theta n matrices A and B using Cannon's algorithm (e.g., see =-=[19]-=-). The input matrices are assumed to be initially partitioned into blocks of size n= p p \Theta n= p p, such that processor i holds the block with index (x; x+y mod p p) of A, and the block with index... |

385 | A Rapid Hierarchical Radiosity Algorithm
- Hanrahan, Salzman, et al.
(Show Context)
Citation Context ...tly working on the implementation of some additional application programs, including the adaptive Fast Multipole Method [7] and a hierarchical algorithm for the radiosity problem in computer graphics =-=[17]-=-. Finally, much algorithmic and experimental work is still needed in the implementation of optimized BSP libraries on different parallel machines. Acknowledgements We thank Andrew Goldberg and Marios ... |

208 |
General purpose parallel architectures
- Valiant
- 1990
(Show Context)
Citation Context ...ogram. In practice, these objectives can conflict, and trade-offs must be made. The correct trade-offs can be selected by taking into account the g and L parameters of the underlying machine. Valiant =-=[34, 35, 33]-=- argues that, at least in theory, this approach is sufficient for portability and efficiency, by showing that many other programming styles can be automatically and efficiently transformed into a BSP ... |

171 | Direct Bulk-Synchronous Parallel Algorithms
- Gerbessiotis, Valiant
- 1994
(Show Context)
Citation Context ...ch is sufficient for portability and efficiency, by showing that many other programming styles can be automatically and efficiently transformed into a BSP style. Furthermore, Gerbessiotis and Valiant =-=[13]-=- point out that a direct implementation on the BSP model will often lead to even better performance. We briefly discuss two aspects of the BSP model. One is that the BSP model views the interconnectio... |

156 | Parallel programming in Split-C
- Culler, Arpaci-Dusseau, et al.
- 1993
(Show Context)
Citation Context ...ith three parameters representing software overhead, network latency, and communication bandwidth. The LogP model has been used as a performance model for active messages[36] and the Split-C language =-=[10]-=-, where it has been applied to the analysis of several algorithms. Other related models are the Postal Model [2], the Atomic Model [22], and several models for end-point contention (e.g., see [1]) ins... |

137 |
A fast adaptive multipole algorithm for particle simulations
- Carrier, Greengard, et al.
- 1988
(Show Context)
Citation Context ..., and we plan to extend our study to several larger machines. We are also currently working on the implementation of some additional application programs, including the adaptive Fast Multipole Method =-=[7]-=- and a hierarchical algorithm for the radiosity problem in computer graphics [17]. Finally, much algorithmic and experimental work is still needed in the implementation of optimized BSP libraries on d... |

113 | Designing broadcast algorithms in the postal model for messagepassing systems
- Bar-Noy, Kipnis
- 1994
(Show Context)
Citation Context ... has been used as a performance model for active messages[36] and the Split-C language [10], where it has been applied to the analysis of several algorithms. Other related models are the Postal Model =-=[2]-=-, the Atomic Model [22], and several models for end-point contention (e.g., see [1]) inspired by the prospect of optical communication in parallel machines. Like BSP and LogP, these models do not refe... |

107 |
Thorsten von Eicken. Logp: towards a realistic model of parallel computation
- Culler, Karp, et al.
- 1993
(Show Context)
Citation Context ...er models for general-purpose parallel computing have been proposed in recent years; see [24] for an overview. An important example for a model based on asynchronous message passing is the LogP model =-=[11]-=-, which models the performance of point-to-pointmessages with three parameters representing software overhead, network latency, and communication bandwidth. The LogP model has been used as a performan... |

96 |
A hierarchical O(NlogN) force-calculation algorithm. The Institute for Advanced Study
- Barnes, Hut
- 1986
(Show Context)
Citation Context ...type of force. The problem has numerous applications in astrophysics, molecular dynamics, fluid dynamics, and even computer graphics. The N-body code in this study is based on the BarnesHut algorithm =-=[3]-=-, which uses an irregular oct-tree structure, called BH tree, to hierarchically group bodies into clusters according to their distribution in three-dimensional space. Our parallel implementation is si... |

90 |
Parallel Visualization Algorithms: Performance and Architectural Implications
- Singh, Gupta, et al.
- 1994
(Show Context)
Citation Context ...s to be implemented on top of these functions. Finally, our choice of the application programs and presentation of the results is influenced by the SPLASH application suite for shared-memory machines =-=[30]-=-. Also, our BSP code for the ocean simulation was obtained by modifying the corresponding SPLASH program. The remainder of the paper is organized as follows. Section 2 describes the versions of the Gr... |

88 | Astrophysical N-body simulations using hierarchical tree data structures
- Warren, Salmon
- 1992
(Show Context)
Citation Context ...tructure, called BH tree, to hierarchically group bodies into clusters according to their distribution in three-dimensional space. Our parallel implementation is similar to those of Warren and Salmon =-=[37]-=- and Liu and Bhatt [23]. In particular, we use the ORB partitioning scheme to partition the bodies among the processors. Instead of repartitioning the bodies after each iteration as in [37], we only d... |

77 | General purpose parallel computing
- McColl
- 1993
(Show Context)
Citation Context ...s, but are not included here. 1.3 Related Work Since the introduction of the BSP model, a number of papers have considered the design and analysis of algorithms under the BSP model; see, for example, =-=[4, 6, 13, 25, 33]-=-. Several groups of researchers are currently exploring the use of the BSP model on existing parallel machines. The Oxford BSP library, developed by Miller [27] while at Oxford University, allows a pr... |

72 | Scientific computing on bulk synchronous parallel architectures
- Bisseling, McColl
- 1994
(Show Context)
Citation Context ...el is useful for designing efficient and portable parallel programs. Another question that we investigate is the accuracy of the BSP cost function in comparison to the actual running times. Following =-=[6]-=-, we provide data for our applications that can be used to predict the execution times on each machine under the BSP cost model. Our results demonstrate that the model was able to predict execution ti... |

57 |
A library for bulk{synchronous parallel programming. British computer society parallel processing http://www.comlab.ox.uk/oucl/oxpara/bsplib.html
- Miller
- 1993
(Show Context)
Citation Context ... model; see, for example, [4, 6, 13, 25, 33]. Several groups of researchers are currently exploring the use of the BSP model on existing parallel machines. The Oxford BSP library, developed by Miller =-=[27]-=- while at Oxford University, allows a processor to directly access the memory of another processor. This makes the library very efficient to implement on shared-memory machines. Moreover, it is well s... |

54 | Tarjan. Models of parallel computation: A survey and synthesis
- Maggs, Matheson, et al.
- 1995
(Show Context)
Citation Context ...recent implementation of a plasma simulation using the Oxford BSP library is described in [28]. A number of other models for general-purpose parallel computing have been proposed in recent years; see =-=[24]-=- for an overview. An important example for a model based on asynchronous message passing is the LogP model [11], which models the performance of point-to-pointmessages with three parameters representi... |

46 |
auf der Heide. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model
- Bäumker, Dittrich, et al.
- 1998
(Show Context)
Citation Context ...s, but are not included here. 1.3 Related Work Since the introduction of the BSP model, a number of papers have considered the design and analysis of algorithms under the BSP model; see, for example, =-=[4, 6, 13, 25, 33]-=-. Several groups of researchers are currently exploring the use of the BSP model on existing parallel machines. The Oxford BSP library, developed by Miller [27] while at Oxford University, allows a pr... |

37 | Communication-efficient parallel algorithms for distributed random-access machines. Algorithmica
- Leiserson, Maggs
- 1988
(Show Context)
Citation Context ... that computes the local components of the minimum spanning tree. The program then enters a parallel phase that uses a simplification of a conservative DRAM algorithm developed by Leiserson and Maggs =-=[21]-=-. Once the number of components becomes small, the program switches to a mixed parallel/sequential phase that first uses all the processors to find subforests of the remaining components using edges t... |

29 | Bulk synchronous parallel computing – a paradigm for transportable software
- Cheatham, Fahmy, et al.
- 1995
(Show Context)
Citation Context ... and industrial applications [18, 20, 26]. A group at Harvard University lead by T. Cheatham and L. Valiant is studying higher-level programming languages and compilation techniques for the BSP model =-=[9, 8]-=-. R. Bisseling at the University of Utrecht is studying the use of the BSP model in the implementation of scientific computations [5, 6]. A recent implementation of a plasma simulation using the Oxfor... |

27 | S.: An atomic model for message-passing
- Liu, Aiello, et al.
- 1993
(Show Context)
Citation Context ...rformance model for active messages[36] and the Split-C language [10], where it has been applied to the analysis of several algorithms. Other related models are the Postal Model [2], the Atomic Model =-=[22]-=-, and several models for end-point contention (e.g., see [1]) inspired by the prospect of optical communication in parallel machines. Like BSP and LogP, these models do not refer to the topology of th... |

24 | BSP programming
- McColl
- 1994
(Show Context)
Citation Context ...or the dynamic applications that we have experimented with. Also at Oxford University, W. McColl's group is working on the development of several BSP programming languages and industrial applications =-=[18, 20, 26]-=-. A group at Harvard University lead by T. Cheatham and L. Valiant is studying higher-level programming languages and compilation techniques for the BSP model [9, 8]. R. Bisseling at the University of... |

21 | Experiences with Parallel N-Body Simulation
- Liu, Bhatt
- 2000
(Show Context)
Citation Context ...e, to hierarchically group bodies into clusters according to their distribution in three-dimensional space. Our parallel implementation is similar to those of Warren and Salmon [37] and Liu and Bhatt =-=[23]-=-. In particular, we use the ORB partitioning scheme to partition the bodies among the processors. Instead of repartitioning the bodies after each iteration as in [37], we only do so if the load imbala... |

19 | Plasma Simulation on Networks of Workstations Using the Bulk-Synchronous Parallel Model
- Nibhanupudi, Norton, et al.
- 1995
(Show Context)
Citation Context ...ty of Utrecht is studying the use of the BSP model in the implementation of scientific computations [5, 6]. A recent implementation of a plasma simulation using the Oxford BSP library is described in =-=[28]-=-. A number of other models for general-purpose parallel computing have been proposed in recent years; see [24] for an overview. An important example for a model based on asynchronous message passing i... |

17 |
The Green BSP Library
- Goudreau, Lang, et al.
- 1995
(Show Context)
Citation Context ...o wish to give a basis for a comparison with asynchronous models such as LogP and certain shared-memory models. In particular, we designed several parallel applications that use the Green BSP library =-=[15]-=-, a small library of BSP message-passing functions that we have implemented on a number of parallel platforms. The applications are: ffl an N-body simulation using the Barnes-Hut algorithm, ffl an oce... |

16 | Scheduling parallel communication: The h-relation problem
- Adler, Byer, et al.
- 1995
(Show Context)
Citation Context ...age [10], where it has been applied to the analysis of several algorithms. Other related models are the Postal Model [2], the Atomic Model [22], and several models for end-point contention (e.g., see =-=[1]-=-) inspired by the prospect of optical communication in parallel machines. Like BSP and LogP, these models do not refer to the topology of the underlying machine, but assume that the interconnection ne... |

11 | Program development and performance prediction on BSP machines using Opal
- Knee
- 1994
(Show Context)
Citation Context ...or the dynamic applications that we have experimented with. Also at Oxford University, W. McColl's group is working on the development of several BSP programming languages and industrial applications =-=[18, 20, 26]-=-. A group at Harvard University lead by T. Cheatham and L. Valiant is studying higher-level programming languages and compilation techniques for the BSP model [9, 8]. R. Bisseling at the University of... |

9 | Sparse matrix computations on bulk synchronous parallel computers
- Bisseling
- 1995
(Show Context)
Citation Context ...gramming languages and compilation techniques for the BSP model [9, 8]. R. Bisseling at the University of Utrecht is studying the use of the BSP model in the implementation of scientific computations =-=[5, 6]-=-. A recent implementation of a plasma simulation using the Oxford BSP library is described in [28]. A number of other models for general-purpose parallel computing have been proposed in recent years; ... |

8 | An Object-Oriented Programming Model for BSP Computations
- Lecomber
- 1994
(Show Context)
Citation Context ...or the dynamic applications that we have experimented with. Also at Oxford University, W. McColl's group is working on the development of several BSP programming languages and industrial applications =-=[18, 20, 26]-=-. A group at Harvard University lead by T. Cheatham and L. Valiant is studying higher-level programming languages and compilation techniques for the BSP model [9, 8]. R. Bisseling at the University of... |

5 |
Why BSP Computers
- Valiant
- 1993
(Show Context)
Citation Context ...ogram. In practice, these objectives can conflict, and trade-offs must be made. The correct trade-offs can be selected by taking into account the g and L parameters of the underlying machine. Valiant =-=[34, 35, 33]-=- argues that, at least in theory, this approach is sufficient for portability and efficiency, by showing that many other programming styles can be automatically and efficiently transformed into a BSP ... |

5 |
A HierarchicalO(NlogN) Force-Calculation Algorithm
- Barnes, Hut
- 1986
(Show Context)
Citation Context ... type of force. The problem has numerous applications in astrophysics, molecular dynamics, fluid dynamics, and even computer graphics. TheN-body code in this study is based on the BarnesHut algorithm =-=[3]-=-, which uses an irregular oct-tree structure, called BH tree, to hierarchically group bodies into clusters according to their distribution in three-dimensional space. Our parallel implementation is si... |

4 |
Data locality and memory system performance in the parallel simulation of ocean eddy currents
- Singh
- 1991
(Show Context)
Citation Context ...gram from the Stanford Parallel Library for Shared Memory Applications (SPLASH) [31] to our BSP system. The program computes ocean eddy currents using a multigrid technique on an underlying grid; see =-=[29]-=- for details. The conversion to BSP was fairly straightforward, due to the fact that the SPLASH code for this application was basically already in a BSP style. 3.1.1 Discussion The performance of the ... |

3 | General purpose optimization technology
- Cheatham, Fahmy, et al.
- 1994
(Show Context)
Citation Context ... and industrial applications [18, 20, 26]. A group at Harvard University lead by T. Cheatham and L. Valiant is studying higher-level programming languages and compilation techniques for the BSP model =-=[9, 8]-=-. R. Bisseling at the University of Utrecht is studying the use of the BSP model in the implementation of scientific computations [5, 6]. A recent implementation of a plasma simulation using the Oxfor... |

2 |
Computing minimum spanning tree with the Green BSP library
- Goldberg, Lang, et al.
- 1996
(Show Context)
Citation Context ...essors to find subforests of the remaining components using edges that are guaranteed to be in the minimum spanning tree, and then uses a single processor to assemble the forests into components. See =-=[14]-=- for more details. The input graphs are generated as follows. Nodes are assigned uniformly at random to points on the unit square. Now construct a graph G(r) on the nodes by adding an edge between all... |

2 |
Communication-efficientparallel algorithms for distributed random-access machines
- Leiserson, Maggs
- 1988
(Show Context)
Citation Context ... that computes the local components of the minimum spanning tree. The program then enters a parallel phase that uses a simplification of a conservative DRAM algorithm developed by Leiserson and Maggs =-=[21]-=-. Once the number of components becomes small, the program switches to a mixed parallel/sequential phase that first uses all the processors to find subforests of the remaining components using edges t... |

1 |
Programming Parallel N-Body Simulations with the BulkSynchronous Parallel Model
- Suel
- 1996
(Show Context)
Citation Context ...mpute the forces on its bodies, and whose structure is consistent with that of the global BH tree constructed by the sequential algorithm. A detailed description of our implementation can be found in =-=[32]-=-. 3.2.1 Discussion As input for our experiments we used the Plummer model generated by the SPLASH code [31]. The timing and speedup results in Figures 3.1 and C.4 show that for large enough input size... |