Results 1  10
of
21
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 45 (13 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Fast SharedMemory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
, 2006
"... ..."
The Design and Analysis of BulkSynchronous Parallel Algorithms
, 1998
"... The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory s ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory style programming with efficient exploitation of data locality. The BSPRAM model can be optimally simulated by a BSP computer for a broad range of algorithms possessing certain characteristic properties: obliviousness, slackness, granularity. We use BSPRAM to design BSP algorithms for problems from three large, partially overlapping domains: combinatorial computation, dense matrix computation, graph computation. Some of the presented algorithms are adapted from known BSP algorithms (butterfly dag computation, cube dag computation, matrix multiplication). Other algorithms are obtained by application of established nonBSP techniques (sorting, randomised list contraction, Gaussian elimination without pivoting and with column pivoting, algebraic path computation), or use original techniques specific to the BSP model (deterministic list contraction, Gaussian elimination with nested block pivoting, communicationefficient multiplication of Boolean matrices, synchronisationefficient shortest paths computation). The asymptotic BSP cost of each algorithm is established, along with its BSPRAM characteristics. We conclude by outlining some directions for future research.
Fast Minimum Spanning Tree for Large Graphs on the GPU
"... Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, dataparallel algorithms map well to the SIMD architecture of currentGPU.Irregularalgorithmsondiscretestructureslikegraphsare harder to map to them. Efficient datamapping prim ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, dataparallel algorithms map well to the SIMD architecture of currentGPU.Irregularalgorithmsondiscretestructureslikegraphsare harder to map to them. Efficient datamapping primitives can play crucialroleinmappingsuchalgorithmsontotheGPU.Inthispaper, we present a minimum spanning tree algorithm on Nvidia GPUs underCUDA,asarecursiveformulationofBor˚uvka’sapproachfor undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertexformationandrecursivegraphconstructionaremappedtoprimitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the TeslaS1070GPU.
CommunicationOptimal Parallel Minimum Spanning Tree Algorithms
, 1998
"... Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of the input graph, and m the number of edges of the input graph. We show that in the worst case, a total of \Omega\Gamma \Delta min(m; pn)) bits need to be communicated in order to solve the MST problem, where is the number of bits required to represent a single edge weight. This implies that if each message communicates at most bits, any BSP algorithm for finding an MST requires communication time \Omega\Gamma g \Delta min(m=p; n)), where g is the gap parameter of the BSP model. In addition, we present two algorithms with communication requirements that match our lower bound in different situations. Both algorithms perform linear work for appropriate values of n, m and p, and use a numbe...
Optimizing Graph Algorithms on Pregellike Systems ∗
"... We study the problem of implementing graph algorithms efficiently on Pregellike systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structu ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of implementing graph algorithms efficiently on Pregellike systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large diameters or skew in component sizes. We describe several optimization techniques to address these inefficiencies. Our most general technique is based on the idea of performing some serial computation on a tiny fraction of the input graph, complementing Pregel’s vertexcentric parallelism. We base our study on thorough implementations of several fundamental graph algorithms, some of which have, to the best of our knowledge, not been implemented on Pregellike systems before. The algorithms and optimizations we describe are fully implemented in our opensource Pregel implementation. We present detailed experiments showing that our optimization techniques improve runtime significantly on a variety of very large graph datasets. 1.
SingleSource Shortest Paths with the Parallel Boost Graph Library
"... The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary verte ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary vertex and edge properties, and different communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s singlesource shortest paths algorithm in the Parallel BGL. We also provide an experimental evaluation of these implementations using synthetic and realworld benchmark graphs from the 9 th DIMACS Implementation Challenge. 1
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
"... Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.
Scalable Coarse Grained Parallel Interval Graph Algorithms
 In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas (2000
, 2000
"... We present scalable coarse grained parallel algorithms for solving interval graph problems on a BSPlike modelCoarse Grained Multicomputers (CGM). The problems we consider include: finding maximum independent set, maximum weighted clique, minimum coloring and cut vertices and bridges. With scal ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We present scalable coarse grained parallel algorithms for solving interval graph problems on a BSPlike modelCoarse Grained Multicomputers (CGM). The problems we consider include: finding maximum independent set, maximum weighted clique, minimum coloring and cut vertices and bridges. With scalability at ; 8ffl ? 0 (here n denotes the total input size and p the number of processors), our algorithms for maximum independent set and minimum coloring use optimal computation time and O(log p) communication rounds, which is independent of the input size and grows slowly only with the number of processors. Equally scalable are our algorithms for finding maximum weighted clique, cut vertices and bridges, which use O(1) communication rounds and optimal local computation time, achieving both communication and computation optimality.
Parallel Hierarchical Clustering on Shared Memory Platforms
"... Abstract—Hierarchical clustering has many advantages over traditional clustering algorithms like kmeans, but it suffers from higher computational costs and a less obvious parallel structure. Thus, in order to scale this technique up to larger datasets, we present SHRINK, a novel sharedmemory algor ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Hierarchical clustering has many advantages over traditional clustering algorithms like kmeans, but it suffers from higher computational costs and a less obvious parallel structure. Thus, in order to scale this technique up to larger datasets, we present SHRINK, a novel sharedmemory algorithm for singlelinkage hierarchical clustering based on merging the solutions from overlapping subproblems. In our experiments, we find that SHRINK provides a speedup of 18–20 on 36 cores on both real and synthetic datasets of up to 250,000 points. Source code for SHRINK is available for download on our website,