Results 1  10
of
11
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Fast Connected Components Algorithms For The EREW PRAM
 SIAM J. COMPUT
, 1999
"... We present fast and e#cient parallel algorithms for finding the connected components of an undirected graph. These algorithms run on the exclusiveread, exclusivewrite (EREW) PRAM. On a graph with n vertices and m edges, our randomized algorithm runs ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
We present fast and e#cient parallel algorithms for finding the connected components of an undirected graph. These algorithms run on the exclusiveread, exclusivewrite (EREW) PRAM. On a graph with<F3.492e+05> n<F3.822e+05> vertices and<F3.492e+05> m<F3.822e+05> edges, our randomized algorithm runs in<F3.492e+05><F3.822e+05> O(log<F3.492e+05><F3.822e+05> n) time using<F3.492e+05> (m<F3.822e+05> +<F3.492e+05> n<F2.77e+05><F2.072e+05> 1+#<F3.822e+05><F3.492e+05> )/<F3.822e+05> log<F3.492e+05> n<F3.822e+05> EREW processors (for any fixed<F3.492e+05> # ><F3.822e+05> 0). A variant uses<F3.492e+05> (m<F3.822e+05> +<F3.492e+05><F3.822e+05><F3.492e+05> n)/<F3.822e+05> log<F3.492e+05> n<F3.822e+05> processors and runs in<F3.492e+05><F3.822e+05> O(log<F3.492e+05> n<F3.822e+05> log log<F3.492e+05><F3.822e+05> n) time. A deterministic version of the algorithm runs in<F3.492e+05><F3.822e+05> O(log<F2.77e+05><F2.072e+05><F2.77e+05> 1.5<F3.492e+05><F3.822e+05> n) time using<F3.492e+...
A Comparison of Parallel Algorithms for Connected Components
 in the Symposium on Parallel Algorithms and Architectures
, 1994
"... This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contracti ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contraction algorithm based on algorithms by Reif [21] and Phillips [20], and a hybrid algorithm [11]. Improvements are given for the first two to improve performance significantly, although without improving their asymptotic complexity. The hybrid combines features of the others and is generally the fastest of those tested. Timings were made using NESL [4] code as executed on a Connection Machine 2 and Cray YMP/C90. 1 Introduction The complexity of various PRAM algorithms has received much attention, but there has been relatively little work on the implementation and pragmatic efficiency of many of these algorithms. Moreover, much of this work has been for algorithms having regular communication ...
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
A randomized timework optimal parallel algorithm for finding a minimum spanning forest
 SIAM J. COMPUT
, 1999
"... We present a randomized algorithm to find a minimum spanning forest (MSF) in an undirected graph. With high probability, the algorithm runs in logarithmic time and linear work on an exclusive read exclusive write (EREW) PRAM. This result is optimal w.r.t. both work and parallel time, and is the fi ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We present a randomized algorithm to find a minimum spanning forest (MSF) in an undirected graph. With high probability, the algorithm runs in logarithmic time and linear work on an exclusive read exclusive write (EREW) PRAM. This result is optimal w.r.t. both work and parallel time, and is the first provably optimal parallel algorithm for this problem under both measures. We also give a simple, general processor allocation scheme for treelike computations.
Optimal randomized EREW PRAM algorithms for finding spanning forests
 J. Algorithms
, 2000
"... We present the first randomized O(log n) time and O(m+n) work EREW PRAM algorithm for finding a spanning forest of an undirected graph G = (V; E) with n vertices and m edges. Our algorithm is optimal with respect to time, work and space. As a consequence we get optimal randomized EREW PRAM algori ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We present the first randomized O(log n) time and O(m+n) work EREW PRAM algorithm for finding a spanning forest of an undirected graph G = (V; E) with n vertices and m edges. Our algorithm is optimal with respect to time, work and space. As a consequence we get optimal randomized EREW PRAM algorithms for other basic connectivity problems such as finding a bipartite partition, finding bridges and biconnected components, finding Euler tours in Eulerian graphs, finding an ear decomposition, finding an open ear decomposition, finding a strong orientation, and finding an stnumbering.
A Randomized Linear Work EREW PRAM Algorithm to Find a Minimum Spanning Forest
, 1997
"... We present a randomized EREW PRAM algorithm to find a minimum spanning forest in a weighted undirected graph. On an nvertex graph the algorithm runs in o((log n) 1+ffl ) expected time for any ffl ? 0 and performs linear expected work. This is the first linear work, polylog time algorithm on th ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We present a randomized EREW PRAM algorithm to find a minimum spanning forest in a weighted undirected graph. On an nvertex graph the algorithm runs in o((log n) 1+ffl ) expected time for any ffl ? 0 and performs linear expected work. This is the first linear work, polylog time algorithm on the EREW PRAM for this problem. This also gives parallel algorithms that perform expected linear work on two more realistic models of parallel computation, the QSM and the BSP. 1 Introduction The design of efficient algorithms to find a minimum spanning forest (MSF) in a weighted undirected graph is a fundamental problem that has received much attention. There have been many algorithms designed for the MSF problem that run in close to linear time (see, e.g., [CLR91]). Recently a randomized lineartime algorithm for this problem was presented in [KKT95]. Based on this work [CKT94] presented a randomized parallel algorithm on the CRCW PRAM which runs in O(2 log n log n) expected time whil...
A Parallel Algorithm for Connected Components On Distributed Memory Machines
, 2001
"... Finding connected components (CC) of an undirected graph is a fundamental computational problem. Various CC algorithms exist for PRAM models. An implementation of a PRAM CC algorithm on a coarsegrain MIMD machine with distributed memory brings many problems, since the communication overhead is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Finding connected components (CC) of an undirected graph is a fundamental computational problem. Various CC algorithms exist for PRAM models. An implementation of a PRAM CC algorithm on a coarsegrain MIMD machine with distributed memory brings many problems, since the communication overhead is substantial compared to the local computation. Several implementations of CC algorithms on distributed memory machines have been described in the literature, all in SplitC. We have designed and implemented a CC algorithm in C ++ and MPI, by combining the ideas of the previous PRAM and distributed memory algorithms. Our main optimization is based on replacing the conditional hooking by rules for reducing nontrivial cycles during the contraction of components. We have also implemented a method for reducing the number of exchanged messages which is based on buffering messages and on deferred processing of answers.
Another PRAM Algorithm for Finding Connected Components of Sparse Graphs
, 1999
"... We present an algorithm which exploits a new approach to the problem of finding the connected components of an undirected graph, CCug for short, with v vertices and e edges. The algorithm has depth O(log² (e)) on a CREW PRAM using e processors, hence its cost is not affected by the number v of g ..."
Abstract
 Add to MetaCart
We present an algorithm which exploits a new approach to the problem of finding the connected components of an undirected graph, CCug for short, with v vertices and e edges. The algorithm has depth O(log² (e)) on a CREW PRAM using e processors, hence its cost is not affected by the number v of graph vertices. This makes the algorithm the one with best speedup and best cost for CCug on highly sparse graphs. On dense graphs conversely, its performance is comparable to the one of the algorithm in [12] and a little worse than the one in [5]. A variant of the algorithm with the same bound but running on the EREW model is also included. The algorithm can be used to find the transitive closure of binary, symmetric relations. In this case e is the number of axioms and v is the range of the relation.
Hierarchical Parallelization of Molecular Fragment Analysis on Multicore Cluster
"... Abstract — Molecular fragment analysis, using connected component identification algorithm, is of great significance for structural and chemical analysis in computer aided material design. However, it is a great challenge to accelerate molecular fragment analysis due to the scale, diversity and irre ..."
Abstract
 Add to MetaCart
Abstract — Molecular fragment analysis, using connected component identification algorithm, is of great significance for structural and chemical analysis in computer aided material design. However, it is a great challenge to accelerate molecular fragment analysis due to the scale, diversity and irregularity of molecular graphs. To address this challenge, we propose a hierarchical parallelization approach consisting of: (1) internode parallelization via spatial decomposition and hookandcontract algorithm; (2) intercore parallelization via masterandworker scheme; and (3) locality optimization based on spacefilling curve to improve memory accessing. Experiments show that the proposed scheme achieves nearly linear internode strong scalability up to 50 million vertices molecular graph on 32 computing nodes, and over 13fold intercore speedup on 16 cores. The experiments also demonstrate the effectiveness of locality optimization on performance enhancement. 1.