Results 1 
4 of
4
The Design and Analysis of BulkSynchronous Parallel Algorithms
, 1998
"... The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory s ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory style programming with efficient exploitation of data locality. The BSPRAM model can be optimally simulated by a BSP computer for a broad range of algorithms possessing certain characteristic properties: obliviousness, slackness, granularity. We use BSPRAM to design BSP algorithms for problems from three large, partially overlapping domains: combinatorial computation, dense matrix computation, graph computation. Some of the presented algorithms are adapted from known BSP algorithms (butterfly dag computation, cube dag computation, matrix multiplication). Other algorithms are obtained by application of established nonBSP techniques (sorting, randomised list contraction, Gaussian elimination without pivoting and with column pivoting, algebraic path computation), or use original techniques specific to the BSP model (deterministic list contraction, Gaussian elimination with nested block pivoting, communicationefficient multiplication of Boolean matrices, synchronisationefficient shortest paths computation). The asymptotic BSP cost of each algorithm is established, along with its BSPRAM characteristics. We conclude by outlining some directions for future research.
Sorting Large Data Sets on a Massively Parallel System
 IN PROCEEDINGS OF THE 6TH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING (SPDP
, 1994
"... This paper presents a performance study for many of today's popular parallel sorting algorithms. It is the first to present a comparative study on a large scale MIMD system. The machine, a Parsytec GCel, contains 1024 processors connected as a twodimensional grid. To justify the experimental r ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a performance study for many of today's popular parallel sorting algorithms. It is the first to present a comparative study on a large scale MIMD system. The machine, a Parsytec GCel, contains 1024 processors connected as a twodimensional grid. To justify the experimental results, we develop a theoretical model to predict the performance in terms of communication and computation times. We get a very close relation between the experiments and the theoretical model as long as the edge congestion caused by the algorithms is predicted precisely. We compare: Bitonicsort, Shearsort, Gridsort, Samplesort, and Radixsort. Experiments were performed using random instances according to a well known benchmark problem. Results show that for the machine we used, Bitonicsort performs best for smaller numbers of keys per processor (! 2048) and Samplesort outperforms all other methods for larger instances.
Summary
"... ffl The importance of communication in hardware Areatime bounds for VLSI chips ffl Linear algebra ..."
Abstract
 Add to MetaCart
(Show Context)
ffl The importance of communication in hardware Areatime bounds for VLSI chips ffl Linear algebra
Architecture
"... In this paper, we present a comparative performance analysis of different parallel sorting algorithms: Bitonic sort and Parallel Radix Sort. In order to study the interaction between the algorithms and architecture, we implemented both the algorithms in OpenCL and compared its performance with Quick ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we present a comparative performance analysis of different parallel sorting algorithms: Bitonic sort and Parallel Radix Sort. In order to study the interaction between the algorithms and architecture, we implemented both the algorithms in OpenCL and compared its performance with Quick Sort algorithm, the fastest algorithm. In our simulation, we have used Intel Core2Duo CPU 2.67GHz and NVidia Quadro FX 3800 as graphical processing unit.