Results 1  10
of
46
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 46 (11 self)
 Add to MetaCart
(Show Context)
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 40 (12 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Towards Efficient and Portability: Programming with the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a model for generalpurpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of the BSP model have been presented, the degree to which the model can be efficiently utilized on existing parallel machines remains unclear. To explore this question, we implemented a small library of BSP functions, called the Green BSP library, on several parallel platforms. We also created a number of parallel applications based on this library. Here, we report on the performance of six of these applications on three different parallel platforms. Our preliminary results suggest that the BSP model can be used to develop efficient and portable programs for a range of machines and applications. 1
BSP vs LogP
, 1996
"... A quantitative comparison of the BSP and LogP models of parallel computation is developed. We concentrate on a variant of LogP that disallows the socalled stalling behavior, although issues surrounding the stalling phenomenon are also explored. Very efficient cross simulations between the two model ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
A quantitative comparison of the BSP and LogP models of parallel computation is developed. We concentrate on a variant of LogP that disallows the socalled stalling behavior, although issues surrounding the stalling phenomenon are also explored. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most pointtopoint networks. In conclusion, within the limits of our analysis that is mainly of an asymptotic nature, BSP and (stallfree) LogP can be viewed as closely related variants within the bandwidthlatency framework for modeling parallel computation. BSP seems somewhat preferable due to its greater simplicity and portability, and slightly greater power. LogP lends itself more naturally to multiuser mode.
BSPLike ExternalMemory Computation
 IN PROC. 3RD ITALIAN CONFERENCE ON ALGORITHMS AND COMPLEXITY
"... In this paper we present a paradigm for solving externalmemory problems, and illustrate it by algorithms for matrix multiplication, sorting, list ranking, transitive closure and FFT. Our paradigm is based on the use of BSP algorithms. The correspondence is almost perfect, and especially the noti ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
In this paper we present a paradigm for solving externalmemory problems, and illustrate it by algorithms for matrix multiplication, sorting, list ranking, transitive closure and FFT. Our paradigm is based on the use of BSP algorithms. The correspondence is almost perfect, and especially the notion of xoptimality carries over to algorithms designed according to our paradigm. The advantages of the approach are similar to the advantages of BSP algorithms for parallel computing: scalability, portability, predictability. The performance measure here is the total work, not only the number of I/O operations as in previous approaches. The predicted performances are therefore more useful for practical applications.
Practical Parallel Algorithms for Minimum Spanning Trees
 In Workshop on Advances in Parallel and Distributed Systems
, 1998
"... We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns are efficient, both in theory and in practice. The algorithms are evaluated theoretically using Valiant's BSP model of parallel computation and empirically through implementation results.
Ultimate Parallel List Ranking?
 Journal of Parallel and Distributed Computing
, 2000
"... Two improved listranking algorithms are presented. The "peelingoff" algorithm leads to an optimal PRAM algorithm, but was designed with application on a real parallel machine in mind. It is simpler than earlier algorithms, and in a range of problem sizes, where previously several algor ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Two improved listranking algorithms are presented. The "peelingoff" algorithm leads to an optimal PRAM algorithm, but was designed with application on a real parallel machine in mind. It is simpler than earlier algorithms, and in a range of problem sizes, where previously several algorithms where required for the best performance, now this single algorithm suffices. If the problem size is much larger than the number of available processors, then the "sparserulingsets" algorithm is even better. In previous versions this algorithm had very restricted practical application because of the large number of communication rounds it was performing. This main weakness of this algorithm is overcome by adding two new ideas, each of which reduces the number of communication rounds by a factor of two. 1 Introduction A list is a basic data structure: it consists of nodes which are linked together, so that every node has precisely one predecessor and one successor, except for the initial n...
Communication Efficient Data Structures on the BSP model with Applications
 IN PROCEEDINGS OF EUROPAR'96
, 1996
"... The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the comp ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the computation and communication requirements of searching ordered hlevel graphs, which include many of the standard data structures. We propose multiway search as a general tool for the design, analysis and implementation of BSP algorithms. This technique allows elegant highlevel design and analysis of algorithms, using data structures similar to those of sequential models. Applications to computational geometry and sorting are also presented. In particular, our new randomized sorting algorithm improves previously known BSP randomized sorting algorithms upon the amount of parallel slackness required to achieve optimality. Moreover, our methods are within a 1 + o(1) multiplicative factor of the ...
Fully Dynamic Search Trees for an Extension of the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communicatio ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. The BSP*model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of coptimality. Intuitively coptimal algorithms tend to speedup p=c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1optimal searching and amortized coptimal insertion and deletion for a small constant c. The second one allows 2optimal searching, and coptimal deletion and insertion for a small constant c. Both results hold with probability 1 \Gamma o(1) for wide ranges of BSP* parameters, where the ranges beco...
Better Tradeoffs for Parallel List Ranking
, 1997
"... An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between th ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between the number of startups and the routing volume. We have implemented them on an Intel Paragon, and they turn out to considerably outperform all earlier algorithms: with P = 2 the sequential algorithm is already beaten for N = 25,000; for P = 100 and N = 10 7 , the speedup is 21, and for N = 10 8 it even reaches 30. A modification of one of our algorithms solves a theoretical question: we show that on onedimensional processor arrays, list ranking can be solved with a number of steps equal to the diameter of the network. 1 Introduction A linked list, hereafter just list, is a basic data structure: it consists of nodes which are linked together, such that every node has precisely one predec...