Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Towards Efficient and Portability: Programming with the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of ..."
The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of the BSP model have been presented, the degree to which the model can be efficiently utilized on existing parallel machines remains unclear. To explore this question, we implemented a small library of BSP functions, called the Green BSP library, on several parallel platforms. We also created a number of parallel applications based on this library. Here, we report on the performance of six of these applications on three different parallel platforms. Our preliminary results suggest that the BSP model can be used to develop efficient and portable programs for a range of machines and applications. 1
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Load Balancing Strategies For Distributed Memory Machines
 MultiScale Phenomena and Their Simulation
, 1997
"... Load balancing in large parallel systems with distributed memory is a difficult task often influencing the overall efficiency of applications substantially. A number of efficient distributed load balancing strategies have been developed in the recent years. Although they are currently not generally ..."
Load balancing in large parallel systems with distributed memory is a difficult task often influencing the overall efficiency of applications substantially. A number of efficient distributed load balancing strategies have been developed in the recent years. Although they are currently not generally available as part of parallel operating systems, it is often not difficult to integrate them into applications. This paper gives a classification of different load balancing problems based on application characteristics. For the case of applications out of the field of scientific computing, useful methods are described in more detail.
PRO: a model for Parallel ResourceOptimal computation
 IN 16TH ANNUAL INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS. IEEE, THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS
, 2002
"... We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of ..."
We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of granularity.
The Design and Analysis of BulkSynchronous Parallel Algorithms
, 1998
"... The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory s ..."
The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory style programming with efficient exploitation of data locality. The BSPRAM model can be optimally simulated by a BSP computer for a broad range of algorithms possessing certain characteristic properties: obliviousness, slackness, granularity. We use BSPRAM to design BSP algorithms for problems from three large, partially overlapping domains: combinatorial computation, dense matrix computation, graph computation. Some of the presented algorithms are adapted from known BSP algorithms (butterfly dag computation, cube dag computation, matrix multiplication). Other algorithms are obtained by application of established nonBSP techniques (sorting, randomised list contraction, Gaussian elimination without pivoting and with column pivoting, algebraic path computation), or use original techniques specific to the BSP model (deterministic list contraction, Gaussian elimination with nested block pivoting, communicationefficient multiplication of Boolean matrices, synchronisationefficient shortest paths computation). The asymptotic BSP cost of each algorithm is established, along with its BSPRAM characteristics. We conclude by outlining some directions for future research.
A Quantitative Measure of Portability with Application to BandwidthLatency Models for Parallel Computing (Extended Abstract)
 In Proc. of EUROPAR 99, LNCS 1685
, 1999
"... We introduce a novel methodology for the quantitative assessment of the effectiveness and portability of models of parallel computation. Specifically, we relate the effectiveness of a model M, adopted for algorithm design, with respect to a platform M', where algorithms developed for M are ultimatel ..."
We introduce a novel methodology for the quantitative assessment of the effectiveness and portability of models of parallel computation. Specifically, we relate the effectiveness of a model M, adopted for algorithm design, with respect to a platform M', where algorithms developed for M are ultimately executed, to the product of crosssimulation slowdowns between M and M'. The portability of M with respect to a class of platforms can be estimated by its minimum effectiveness over the platforms in the class. We apply our methodology to assess the portability of enhanced variants of the BSP model with respect to processor networks, with particular emphasis on multidimensional arrays.
PRO: A Model for the Design and Analysis of Efficient and Scalable Parallel Algorithms
 NORDIC JOURNAL OF COMPUTING
, 2006
"... We present a new parallel computation model called the Parallel ResourceOptimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architectureindependent manner, and to simplify the analysis of such algorithms. A focus on ..."
We present a new parallel computation model called the Parallel ResourceOptimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architectureindependent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a specific sequential algorithm. Second, a PRO algorithm is required to be both time and spaceoptimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model—and implemented using its accompanying programming environment SSCRAP—demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
Towards a Scalable Parallel Object Database  The Bulk Synchronous Parallel Approach
, 1996
"... Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing co ..."
Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing costeffective, the requirements for nonnumerical (or symbolic) applications, and the previous attempts to develop parallel databases. The central theme of the Bulk Synchronous Parallel model is to provide a high level abstraction of parallel computing hardware whilst providing a realisation of a parallel programming model that enables architecture independent programs to deliver scalable performance on diverse hardware platforms. Therefore, the primary objective of this report is to investigate the feasibility of developing a portable, scalable, parallel object database, based on the Bulk Synchronous Parallel model of computation. In particular, we devise a way of providing highlevel abstra...
Efficient Use of Parallel & Distributed Systems: From Theory to Practice
, 1995
"... . This article focuses on principles for the design of e#cient parallel algorithms for distributed memory computing systems. We describe the general trend in the development of architectural properties and evaluate the stateoftheart in a number of basic primitives like graph embedding, parti ..."
. This article focuses on principles for the design of e#cient parallel algorithms for distributed memory computing systems. We describe the general trend in the development of architectural properties and evaluate the stateoftheart in a number of basic primitives like graph embedding, partitioning, dynamic load distribution, and communication which are used, to some extent, within all parallel applications. We discuss possible directions for future work on the design of universal basic primitives, able to perform e#ciently on a broad range of parallel systems and applications, and we also give certain examples of speci#c applications which demand specialized basic primitives in order to obtain e#cient parallel implementations. Finally,we show that programming frames can o#er a convenientway to encapsulate algorithmic knowhow on applications and basic primitives and to o#er this knowledge to nonspecialist users in a very e#ectiveway. 1 Introduction Parallel processi...