Results 1  10
of
56
Scalable Computing
 Computer Science Today: Recent Trends and Developments
, 1996
"... . Scalable computing will, over the next few years, become the normal form of computing. In this paper we present a unified framework, based on the BSP model, which aims to serve as a foundation for this evolutionary development. A number of important techniques, tools and methodologies for the desi ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
. Scalable computing will, over the next few years, become the normal form of computing. In this paper we present a unified framework, based on the BSP model, which aims to serve as a foundation for this evolutionary development. A number of important techniques, tools and methodologies for the design of sequential algorithms and programs have been developed over the past few decades. In the transition from sequential to scalable computing we will find that new requirements such as universality and predictable performance will necessitate significant changes of emphasis in these areas. Programs for scalable computing, in addition to being fully portable, will have to be efficiently universal, offering high performance, in a predictable way, on any general purpose parallel architecture. The BSP model provides a discipline for the design of scalable programs of this kind. We outline the approach and discuss some of the issues involved. 1 Introduction For fifty years, sequential computin...
BSPlib: The BSP Programming Library
, 1998
"... BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access using put or get operations, and bulk synchronous message passing. Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms, sorting, and molecular dynamics.
A TwoDimensional Data Distribution Method For Parallel Sparse MatrixVector Multiplication
 SIAM REVIEW
"... A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to onedimensional methods, and in general a good balance in the communication work.
Towards Efficient and Portability: Programming with the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of the BSP model have been presented, the degree to which the model can be efficiently utilized on existing parallel machines remains unclear. To explore this question, we implemented a small library of BSP functions, called the Green BSP library, on several parallel platforms. We also created a number of parallel applications based on this library. Here, we report on the performance of six of these applications on three different parallel platforms. Our preliminary results suggest that the BSP model can be used to develop efficient and portable programs for a range of machines and applications. 1
A Quantitative Comparison of Parallel Computation Models
, 1996
"... This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP1 with 1024 processors, a 64node GCel and a CM5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP1 with 1024 processors, a 64node GCel and a CM5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate whether the models correctly predict the execution time of an algorithm implementation. Unlike previous work, which mostly demonstrated a close match between the measured and predicted running times, this paper shows that there are situations in which the models do not precisely predict the actual execution time of an algorithm implementation. Second, there is a comparison part in which the models are contrasted with each other in order to determine which model induces the fastest algorithms. Finally, there is an efficiency validation part in which the performance of the model derived algorithms are compared with the performance of highly optimized library routines to show the effectiveness ...
Accounting for memory bank contention and delay in highbandwidth multiprocessors
 In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1997
"... Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant’s bulksynchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)BSP. We show experimentally that the (d, x)BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machinespecific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)BSP as a bridging model for emulating a very highlevel abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)BSP.
BSP Programming
, 1994
"... . The Bulk Synchronous Parallel (BSP) model provides a unified framework for the design and programming of general purpose parallel computing systems. In this paper we describe some programming language developments which are currently being pursued as part of this new, unified approach to scalable ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
. The Bulk Synchronous Parallel (BSP) model provides a unified framework for the design and programming of general purpose parallel computing systems. In this paper we describe some programming language developments which are currently being pursued as part of this new, unified approach to scalable parallel computing. 1. The BSP Model A parallel random access machine (PRAM) [4] consists of a collection of processors which compute synchronously in parallel and which communicate with a common global random access memory. A major issue in theoretical computer science since the late 1970s has been to determine the extent to which the idealised PRAM model can be efficiently implemented on physically realistic distributed memory architectures. A number of new routing and memory management techniques have been developed which show that efficient implementation is indeed possible in many cases [5, 10, 11]. The efficient implementation of a single address space on a distributed memory architec...
Scalable Parallel Computing: A Grand Unified Theory and its Practical Development
"... this paper we describe the BSP model and discuss some of the developments in architecture, algorithms and programming languages which are currently being pursued as part of this new, unified approach to scalable parallel computing. Keyword Codes: C.1.2; D.1.3; F.1.1 Keywords: Multiprocessors; Concur ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
this paper we describe the BSP model and discuss some of the developments in architecture, algorithms and programming languages which are currently being pursued as part of this new, unified approach to scalable parallel computing. Keyword Codes: C.1.2; D.1.3; F.1.1 Keywords: Multiprocessors; Concurrent Programming; Models of Computation
Fully Dynamic Search Trees for an Extension of the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. Th ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. The BSP*model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of coptimality. Intuitively coptimal algorithms tend to speedup p=c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1optimal searching and amortized coptimal insertion and deletion for a small constant c. The second one allows 2optimal searching, and coptimal deletion and insertion for a small constant c. Both results hold with probability 1 \Gamma o(1) for wide ranges of BSP* parameters, where the ranges beco...
Communication Efficient Data Structures on the BSP model with Applications
 IN PROCEEDINGS OF EUROPAR'96
, 1996
"... The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the comp ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the computation and communication requirements of searching ordered hlevel graphs, which include many of the standard data structures. We propose multiway search as a general tool for the design, analysis and implementation of BSP algorithms. This technique allows elegant highlevel design and analysis of algorithms, using data structures similar to those of sequential models. Applications to computational geometry and sorting are also presented. In particular, our new randomized sorting algorithm improves previously known BSP randomized sorting algorithms upon the amount of parallel slackness required to achieve optimality. Moreover, our methods are within a 1 + o(1) multiplicative factor of the ...