Results 1  10
of
13
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Fast SharedMemory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
, 2006
"... ..."
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
(Show Context)
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
SingleSource Shortest Paths with the Parallel Boost Graph Library
"... The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary verte ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary vertex and edge properties, and different communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s singlesource shortest paths algorithm in the Parallel BGL. We also provide an experimental evaluation of these implementations using synthetic and realworld benchmark graphs from the 9 th DIMACS Implementation Challenge. 1
The Euler tour technique and parallel rooted spanning tree
 In Proc. Int’l Conf. on Parallel Processing (ICPP
, 2004
"... Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanni ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanning tree using the Euler tour technique. With a randomized worktime optimal unrooted spanning tree algorithm and worktime optimal list ranking, finding rooted spanning trees can be done worktime optimally on EREW PRAM w.h.p. Yet the Euler tour technique assumes as “given ” a circular adjacency list, it is not without implications though to construct the circular adjacency list for the spanning tree found on the fly by a spanning tree algorithm. In fact our experiments show that this “hidden ” step of constructing a circular adjacency list could take as much time as both spanning tree and list ranking combined. In this paper we present new efficient algorithms that find rooted spanning trees without using the Euler tour technique and incur little or no overhead over the underlying spanning tree algorithms. We also present two new approaches that construct Euler tours efficiently when the circular adjacency list is not given. One is a deterministic PRAM algorithm and the other is a randomized algorithm in the symmetric multiprocessor (SMP) model. The randomized algorithm takes a novel approach for the problems of constructing the Euler tour and rooting a tree. It computes a rooted spanning tree first, then constructs an Euler tour directly for the tree using depthfirst traversal. The tour constructed is cachefriendly with adjacent edges in the
Lockfree parallel algorithms: An experimental study
 In Proceedings of the 11th International Conference High Performance Computing
, 2004
"... Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical sections such as priority inversion and deadlock. For parallel computers with closelycoupled processors and shared memory, these issues are no longer major concerns. While many of the results are applicable especially when the model used is shared memory multiprocessors, no prior studies have considered improving the performance of a parallel implementation by way of lockfree programming. As a matter of fact, often times in practice lock free data structures in a distributed setting do not perform as well as those that use locks. As the data structures and algorithms for parallel computing are often drastically different from those in distributed computing, it is possible that lockfree programs perform better. In this paper we compare the similarity and difference of lockfree programming in both distributed and parallel computing environments and explore the possibility of adapting lockfree programming to parallel computing to improve performances. Lockfree programming also provides a new way of simulating PRAM and asynchronous PRAM algorithms on current parallel machines.
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
"... Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.
A Simple and Practical LinearWork Parallel Algorithm for Connectivity
"... Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel al ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel algorithms require superlinear work, and the linearwork polylogarithmicdepth parallel algorithms are very complicated and not amenable to implementation. In this work, we address this gap by describing a simple and practical expected linearwork, polylogarithmic depth parallel algorithm for graph connectivity. Our algorithm is based on a recent parallel algorithm for generating lowdiameter graph decompositions by Miller et al. [44], which uses parallel breadthfirst searches. We discuss a (modest) variant of their decomposition algorithm which preserves the theoretical complexity while leading to simpler and faster implementations. We experimentally compare the connectivity algorithms using both the original decomposition algorithm and our modified decomposition algorithm. We also experimentally compare against the fastest existing parallel connectivity implementations (which are not theoretically linearwork and polylogarithmicdepth) and show that our implementations are competitive for various input graphs. In addition, we compare our implementations to sequential connectivity algorithms and show that on 40 cores we achieve good speedup relative to the sequential implementations for many input graphs. We discuss the various optimizations used in our implementations and present an extensive experimental analysis of the performance. Our algorithm is the first parallel connectivity algorithm that is both theoretically and practically efficient.
Fast SharedMemory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs (Extended Abstract)
"... Minimum Spanning Tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bio ..."
Abstract
 Add to MetaCart
(Show Context)
Minimum Spanning Tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bioterrorism such as detecting the spread of toxins through populations in the case of biological/chemical warfare. Most of the previous attempts for improving the speed of MST using parallel computing are too complicated to implement or perform well only on special graphs with regular structure. In this paper we design and implement four parallel MST algorithms (three variations of Bor˚uvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm. In fact, our algorithms also solve the minimum spanning forest problem. We provide an experimental study of our algorithms on symmetric multiprocessors such as IBM’s p690/Regatta and Sun’s Enterprise servers. Our new implementation achieves good speedups over a wide range of input graphs with regular and irregular structures, including the graphs used by previous parallel MST studies. For example, on an arbitrary random graph with 1M vertices and 20M edges, our new approach achieves a speedup of 5 using 8 processors. The source code for these algorithms is freelyavailable from our web sitehpc.ece.unm.edu. 1.
Abstract
, 2004
"... Irregular problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous accesses to global data structures with low degrees of locality. Few parallel graph algorithms on distributed or sharedmemory machines can outperform their best sequential impleme ..."
Abstract
 Add to MetaCart
(Show Context)
Irregular problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous accesses to global data structures with low degrees of locality. Few parallel graph algorithms on distributed or sharedmemory machines can outperform their best sequential implementation due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two important combinatorial algorithms, list ranking and connected components, on two types of sharedmemory computers: symmetric multiprocessors (SMP) such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA2. Previous studies show that for SMPs performance is primarily a function of noncontiguous memory accesses, whereas for the MTA, it is primarily a function of the number of concurrent operations. We present a performance model for each machine, and use it to analyze the performance of the two algorithms. We compare the models for SMPs and the MTA and discuss how the difference affects algorithm development, ease of programming, performance, and scalability.