Results 1 - 10
of
38
Lessons Learned from IMPLEMENTING BSP
, 1996
"... We focus on two criticisms of Bulk Synchronous Parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
We focus on two criticisms of Bulk Synchronous Parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, not just about BSP but about parallel programming in general, because they are based on misconceptions about the origins of poor performance. The main implication for parallel programming is that higher levels of abstraction do not only make software construction easier---they also make high-performance implementation easier. 1 Introduction Bulk Synchronous Parallelism [4, 5], or BSP, is a model of parallel computation whose aim is general-purpose parallel programming. It provides a level of abstraction that makes programs portable across the full range of parallel architectures, at the same time providing implementations that are efficient [3]. Unlike most parallel programming environment...
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance
- In Proc. of 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP
, 1999
"... The Paderborn University BSP (PUB) library is a parallel C library based on the BSP model. The basic library supports buffered and unbuffered asynchronous communication between any pair of processors, and a mechanism for synchronizing the processors in a barrier style. In addition, it provides routi ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
The Paderborn University BSP (PUB) library is a parallel C library based on the BSP model. The basic library supports buffered and unbuffered asynchronous communication between any pair of processors, and a mechanism for synchronizing the processors in a barrier style. In addition, it provides routines for collective communication on arbitrary subsets of processors, partition operations, and a zero-cost synchronization mechanism. Furthermore, some techniques used in its implementation deviate significantly from the techniques used in other BSP libraries. 1. Introduction Most message-passing libraries, like PVM[6] and MPI[12], are based on pairwise sends and receives. This means that for each send operation, a matching receive has to be issued on the destination processor. This approach, however, is very error prone because deadlocks can be easily introduced if a message is never accepted. Furthermore, it is difficult to determine the correctness and complexity of programs implemented ...
Systematic Parallel Programming
, 2000
"... reasoning, trace semantics. Parallel computers have not yet had the expected impact on mainstream computing. Parallelism adds a level of complexity to the programming task that makes it very error-prone. Moreover, a large variety ofvery di erent parallel architectures exists. Porting an implementati ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
reasoning, trace semantics. Parallel computers have not yet had the expected impact on mainstream computing. Parallelism adds a level of complexity to the programming task that makes it very error-prone. Moreover, a large variety ofvery di erent parallel architectures exists. Porting an implementation from one machine to another may require substantial changes. This thesis addresses some of these problems by developing a formal basis for the design of parallel programs in form of a re nement calculus. The calculus allows the stepwise formal derivation of an abstract, low-level implementation from a trusted, high-level speci cation. The calculus thus helps structuring and documenting the development process. Portability is increased, because the introduction of a machine-dependent feature can be located in the re nement tree. Development e orts above this point in the tree are independent of that feature and are thus reusable. Moreover, the discovery of new, possibly more
miniBSP: A BSP Language and Transformation System
, 1996
"... We define a small BSP-based language, that is simpler and more flexible than standard BSP. Its composition rules allow supersteps to be composed sequentially and in parallel, so it is a subsetsynchronising language. We give a cost-neutral transformation of BSP to miniBSP , showing how it can be used ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We define a small BSP-based language, that is simpler and more flexible than standard BSP. Its composition rules allow supersteps to be composed sequentially and in parallel, so it is a subsetsynchronising language. We give a cost-neutral transformation of BSP to miniBSP , showing how it can be used to compile BSP for clustered architectures. miniBSP also show that BSP can be regarded as one end of a spectrum of languages whose other end is dataflow. Keywords: Bulk synchronous parallelism, program transformation, cluster computing, dataflow. 1 Introduction Bulk Synchronous Parallelism (BSP) [2--5] is a general-purpose parallel programming model which has been successfully applied to a wide range of numerical problems, and on a very wide range of parallel computers. BSP programs consist of a sequence of supersteps, each of which contains three phases: ffl execution of a fixed number, p, of processes, each local to a processor and using only locallyheld variables; ffl global communica...
JBSP: A BSP Programming Library In Java
"... In this paper, we introduce a Java implementation of the Bulk Synchronous Parallel (BSP) model. JBSP (a Java-based BSP system) uses a two-daemon architecture which makes a clear separation of the computation and communication involved in parallel programs. Java threads are used in the implementat ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper, we introduce a Java implementation of the Bulk Synchronous Parallel (BSP) model. JBSP (a Java-based BSP system) uses a two-daemon architecture which makes a clear separation of the computation and communication involved in parallel programs. Java threads are used in the implementation of the JBSP system to realize user dened JBSP tasks as well as to carry out system activities.
Distributed query processing using suffix arrays
- In Int. Conf. on String Processing and Information Retrieval, Lecture Notes in Computer Science
, 2003
"... Abstract. Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or musical data or with texts written in oriental languages, and when searching for phrases, approximate patte ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or musical data or with texts written in oriental languages, and when searching for phrases, approximate patterns and, in general, regular expressions involving separators. In this paper we propose algorithms for processing in parallel batches of queries upon distributed text databases. We present efficient alternatives for speeding up query processing using distributed realizations of suffix arrays. Empirical results obtained from natural language text on a cluster of PCs show that the proposed algorithms are efficient in practice. 1
Multiprogramming BSP Programs
, 1996
"... We explore the problem of transforming a BSP program for execution on a multiprogramming architecture, where it has to share resources with other BSP programs executing at the same time. 1 The BSP model The Bulk Synchronous Parallelism (BSP) [2] model is a general-purpose model that is both archite ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We explore the problem of transforming a BSP program for execution on a multiprogramming architecture, where it has to share resources with other BSP programs executing at the same time. 1 The BSP model The Bulk Synchronous Parallelism (BSP) [2] model is a general-purpose model that is both architecture-independent and efficient for most problems on today's architectures. A BSP program consists of a set of supersteps, each of which consists of: ffl a set of threads, involving local computation on locally-held variables; ffl a global communication in which data is moved between threads; and ffl a barrier synchronisation, which ends the superstep, and defines the moment at which moved data becomes locally visible. BSP does not exploit locality, so programmers may not make any assumptions about how threads will be mapped to processors. In practice, BSP implementations randomise this placement so that the set of messages to be delivered at any moment during the communication phase will ...
Efficient Personalized Communication on Wormhole Networks
- 1997 International Conference on Parallel Architectures and Compilation Techniques, PACT'97
, 1997
"... Bridging models, as the BSP, tend to abstract the characteristics of the interconnection networks using a small set of parameters, by dividing the computation in supersteps and organizing the communication in global patterns called h-relations. In this paper we evaluate, through experimental results ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Bridging models, as the BSP, tend to abstract the characteristics of the interconnection networks using a small set of parameters, by dividing the computation in supersteps and organizing the communication in global patterns called h-relations. In this paper we evaluate, through experimental results conducted on a wormhole-routed bi-dimensional torus and a quaternary fat-tree with 256 processing nodes, the execution time of three families of h-relations with variable degree of imbalance. We also prove a strong result that links the communication performance of the fat-tree with the BSP abstraction of the interconnection network. Given a generic h-relation, we can provide a value of g that, in the worst case, slightly overestimates the completion time and is very close to optimality.
Parallel text query processing using Composite Inverted Lists
- IN SECOND INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (WEB COMPUTING SESSION). IO
, 2003
"... The inverted lists strategy is frequently used as an index data structure for very large textual databases. Its implementation and comparative performance has been studied in sequential and parallel applications. In the latter, with relatively few studies, there has been a sort of "which-is-better ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The inverted lists strategy is frequently used as an index data structure for very large textual databases. Its implementation and comparative performance has been studied in sequential and parallel applications. In the latter, with relatively few studies, there has been a sort of "which-is-better" discussion about two alternative parallel realizations of the basic data structure and algorithms. We suggest that a mix between the two is actually a better alternative. Depending on the workload generated by the users, the composite inverted lists algorithm we propose in this paper can operate either as a local or global inverted list, or both at the same time.
Building BSP Programs Using the Refinement Calculus
- IN FORMAL METHODS FOR PARALLEL PROGRAMMING AND APPLICATIONS, IPPS/SPDP'98, VOLUME 1388 OF LNCS
, 1996
"... We extend the Refinement Calculus to permit the derivation of programs in the Bulk Synchronous Parallelism (BSP) style. This provides a mechanism for constructing correct programs in this portable and efficient style. ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We extend the Refinement Calculus to permit the derivation of programs in the Bulk Synchronous Parallelism (BSP) style. This provides a mechanism for constructing correct programs in this portable and efficient style.

