Results 1  10
of
20
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
 in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract

Cited by 63 (23 self)
 Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 17 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 38 it is assumed that n p,>0, which is true for all commercially
Explicit MultiThreading (XMT) Bridging Models for Instruction Parallelism
 Proc. 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1998
"... The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms throu ..."
Abstract

Cited by 39 (14 self)
 Add to MetaCart
(Show Context)
The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms through architecture to implementation is introduced; new elements are added where needed. The more detailed presentation is by way of a bridging model. Among other things, a bridging model provides a design space for algorithm designers and programmers, as well as a design space for computer architects. It is convenient to describe our wider vision regarding "parallelcomputingonachip" as a twostage development and therefore two bridging models are presented: Spawnbased multithreading (SpawnMT) and Elastic multithreading (EMT). The case for SpawnMT (or, alternatively, EMT) as a bridging model relies on the following evidence. (1) SpawnMT comprises an "instruction set level", wh...
CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters
 International Journal of High Performance Computing Applications
, 2003
"... In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PCclu8(T9 based on CGM algo rithms. CGMgraph implements parallel methods for variou graph prob lems. Ou implementations of deterministic list ranking, Eu er tou con nected components, spanning forest, and ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PCclu8(T9 based on CGM algo rithms. CGMgraph implements parallel methods for variou graph prob lems. Ou implementations of deterministic list ranking, Eu er tou con nected components, spanning forest, and bipartite graph detection are, to ou r knowledge, the first e#cient implementations for PC clu sters.Ou library also inclu des CGMlib, a library of basic CGM tools su ch as sort ing, prefix su m, one to all broadcast, all to one gather, h Relation, all to all broadcast, array balancing, and CGM partitioning. Both libraries are available for download at http://cgm.dehne.net. 1
Prefix computations on symmetric multiprocessors
 Journal of Parallel and Distributed Computing
, 1998
"... We introduce a new prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
We introduce a new prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas ReidMiller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on four symmetric multiprocessors: the HPConvex Exemplar (SClass), the IBM SP2 (High Node), the SGI Power Challenge, and the DEC AlphaServer. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems,
Better Tradeoffs for Parallel List Ranking
, 1997
"... An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between th ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between the number of startups and the routing volume. We have implemented them on an Intel Paragon, and they turn out to considerably outperform all earlier algorithms: with P = 2 the sequential algorithm is already beaten for N = 25,000; for P = 100 and N = 10 7 , the speedup is 21, and for N = 10 8 it even reaches 30. A modification of one of our algorithms solves a theoretical question: we show that on onedimensional processor arrays, list ranking can be solved with a number of steps equal to the diameter of the network. 1 Introduction A linked list, hereafter just list, is a basic data structure: it consists of nodes which are linked together, such that every node has precisely one predec...
PRO: A Model for the Design and Analysis of Efficient and Scalable Parallel Algorithms
"... We present a new parallel computation model that enables the design of resourceoptimal and scalable parallel algorithms and simplifies their analysis. The model rests on the following novel ideas: it incorporates optimality relative to a specific sequential algorithm as an integral part, and it mea ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We present a new parallel computation model that enables the design of resourceoptimal and scalable parallel algorithms and simplifies their analysis. The model rests on the following novel ideas: it incorporates optimality relative to a specific sequential algorithm as an integral part, and it measures the quality of a parallel algorithm in terms of granularity. Inspired by the BSP model, an algorithm in the PRO model is organized as a sequence of supersteps. The supersteps are not however required to be separated by synchronization barriers.
Experiments with List Ranking for Explicit MultiThreaded (XMT) Instruction Parallelism
 Proc. 3rd Workshop on Algorithms Engineering (WAE99
, 1999
"... Algorithms for the problem of list ranking are empirically studied for the Explicit MultiThreaded (XMT) platform for instruction parallelism. The main goal of this study is to understand the the differences between XMT and more traditional parallel computing implementation platforms/models as they ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Algorithms for the problem of list ranking are empirically studied for the Explicit MultiThreaded (XMT) platform for instruction parallelism. The main goal of this study is to understand the the differences between XMT and more traditional parallel computing implementation platforms/models as they pertain to the well studied list ranking problem. The main two findings are: (i) Good speedups for much smaller inputs are possible. (ii) In part, this finding is based on competitive performance by a new variant of [Vi84], called the NoCut algorithm. The paper incorporates analytic (nonasymptotic) performance analysis into experimental performance analysis for relatively small inputs. This provides an interesting example where experimental research and theoretical analysis complement one another. Explicit MultiThreading (XMT) is a finegrained computation framework. XMT covers the spectrum from algorithms through architecture to implementation; the main innovation in XMT (in...
An experimental validation of the PRO model for parallel and distributed computation
 in 14th Euromicro Conference on Parallel, Distributed and Network based Processing, B. Di Martino, Ed. IEEE, The Institute of Electrical and Electronics Engineers
, 2006
"... Abstract—The Parallel ResourceOptimal (PRO) computation model was introduced by Gebremedhin et al. [2002] as a framework for the design and analysis of efficient parallel algorithms. The key features of the PRO model that distinguish it from previous parallel computation models are the full integr ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Abstract—The Parallel ResourceOptimal (PRO) computation model was introduced by Gebremedhin et al. [2002] as a framework for the design and analysis of efficient parallel algorithms. The key features of the PRO model that distinguish it from previous parallel computation models are the full integration of resourceoptimality into the design process and the use of a granularity function as a parameter for measuring quality. In this paper we present experimental results on parallel algorithms, designed using the PRO model, for two representative problems: list ranking and sorting. The algorithms are implemented using SSCRAP, our environment for developing coarsegrained algorithms. The experimental performance results observed agree well with analytical predictions using the PRO model. Moreover, by using different platforms to run our experiments, we have been able to provide an integrated view of the modeling of an underlying architecture and the design and implementation of scalable parallel algorithms. I.
Parallel and External List Ranking and Connected Components
, 1999
"... Improved parallel, external and parallelexternal algorithms for listranking and computing the connected components of a graph are presented. These algorithms are implemented and tested on a cluster of workstations using the C programming language and mpich, a portable implementation of the MPI (Me ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Improved parallel, external and parallelexternal algorithms for listranking and computing the connected components of a graph are presented. These algorithms are implemented and tested on a cluster of workstations using the C programming language and mpich, a portable implementation of the MPI (MessagePassing Interface) standard. Key Words: algorithms, parallel, external, list ranking, connected components, cluster 1 Introduction Lists and Graphs. A list is a basic data structure: it consists of nodes which are linked together, such that every node has precisely one predecessor and one successor, except for the initial node, which has no predecessor, and the final node, which has no successor. List ranking is a basic problem on lists. The task is to compute the position of each node i by giving a distinguished node j of its list (for example, the last node) and the number of links between i and j. An undirected graph, hereafter just graph, is a basic data structure which consist...
Routing with Finite Speeds of Memory and Network
 Proc. 22nd Symposium on the Mathematical Foundations of Computer Science, LNCS 1295
, 1997
"... On practical parallel computers, the time for routing a distribution of sufficiently large packets can be approximated by maxfTf ; Tbg. Here Tf is proportional to the maximum number of bytes a PU sends and receives, and Tb is proportional to the maximum number of bytes a connection in the network ha ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
On practical parallel computers, the time for routing a distribution of sufficiently large packets can be approximated by maxfTf ; Tbg. Here Tf is proportional to the maximum number of bytes a PU sends and receives, and Tb is proportional to the maximum number of bytes a connection in the network has to transfer. We show that several important routing patterns can be performed by a sequence of balanced alltoall routings and analyze how to optimally perform these under the above costmodel. We concentrate on dimensionorder routing on meshes, and assume that the routing pattern must be decomposed into a sequence of permutations. The developed strategy has been implemented on the Intel Paragon. In comparison with the trivial strategy, in which PU i routes to PU (i+t) mod P in permutation t, 1 t ! P , one gains between 10 and 20%. 1 Introduction 1.1 Importance of Balanced AlltoAll Routing On parallel computers, communication is essential: the processing units, PUs, need to exchan...