## On the architectural requirements for efficient execution of graph algorithms (2005)

### Cached

### Download Links

- [www-static.cc.gatech.edu]
- [www.cc.gatech.edu]
- [www.cc.gatech.edu]
- [www.cc.gatech.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. 34th Int’l Conf. on Parallel Processing (ICPP |

Citations: | 15 - 7 self |

### BibTeX

@INPROCEEDINGS{Bader05onthe,

author = {David A. Bader and John Feo and Guojing Cong},

title = {On the architectural requirements for efficient execution of graph algorithms},

booktitle = {In Proc. 34th Int’l Conf. on Parallel Processing (ICPP},

year = {2005},

pages = {547--556},

publisher = {IEEE Computer Society}

}

### Years of Citing Articles

### OpenURL

### Abstract

Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures

### Citations

642 |
An introduction to parallel algorithms
- JáJá
- 1992
(Show Context)
Citation Context ...etworks than clusters, and direct access to all memory locations avoids the overhead of message passing. Fast parallel algorithms for graph problems have been developed for such systems. List ranking =-=[11, 31, 32, 23]-=- is a key technique often needed in efficient parallel algorithms for solving many graph-theoretic problems; for example, computing the centroid of a tree, expression evaluation, minimum spanning fore... |

636 | Leda: A platform for combinatorial and geometric computing
- Mehlhorn, Näher
- 1995
(Show Context)
Citation Context ...connected components, we create a random graph of n vertices and m edges by randomly adding m unique edges to the vertex set. Several software packages generate random graphs this way, including LEDA =-=[27]-=-. The running times for connected components on the SMP and MTA are given in Fig. 2 for a random graph with n =1M vertices and from m = 4M to 20M edges. (Note that throughout this paper M =2 20 .) Sim... |

117 |
An O(log n) parallel connectivity algorithm
- Shiloach, Vishkin
- 1982
(Show Context)
Citation Context ...ty is a fundamental graph problem with a range of applications and can be building blocks for higher-level algorithms. The research community has produced a rich collection of theoretic deterministic =-=[28, 21, 30, 26, 8, 9, 7, 18, 24, 34, 1, 12, 14]-=- and randomized [17, 29] parallel algorithms for connected components. Yet for implementations and experimental studies, although several fast PRAM algorithms exist, to our knowledge there is no paral... |

63 |
Efficient parallel algorithms for some graph problems
- Lam, Chen
- 1982
(Show Context)
Citation Context ...ty is a fundamental graph problem with a range of applications and can be building blocks for higher-level algorithms. The research community has produced a rich collection of theoretic deterministic =-=[28, 21, 30, 26, 8, 9, 7, 18, 24, 34, 1, 12, 14]-=- and randomized [17, 29] parallel algorithms for connected components. Yet for implementations and experimental studies, although several fast PRAM algorithms exist, to our knowledge there is no paral... |

47 |
Computing connected components in parallel computers
- HIRSCHBERG, CHANDRA, et al.
(Show Context)
Citation Context ...ty is a fundamental graph problem with a range of applications and can be building blocks for higher-level algorithms. The research community has produced a rich collection of theoretic deterministic =-=[28, 21, 30, 26, 8, 9, 7, 18, 24, 34, 1, 12, 14]-=- and randomized [17, 29] parallel algorithms for connected components. Yet for implementations and experimental studies, although several fast PRAM algorithms exist, to our knowledge there is no paral... |

46 |
An Optimal Randomized Parallel Algorithm for Finding Connected Components in a Graph
- Gazit
- 1991
(Show Context)
Citation Context |

34 |
Efficient Parallel Algorithms for Graph Problems
- Kruskal
- 1986
(Show Context)
Citation Context |

31 | A Comparison of Data-Parallel Algorithms for Connected Components
- Greiner
(Show Context)
Citation Context ...IEEE parallel speedup on sparse, irregular graphs when compared against the best sequential implementation. Prior experimental studies of connected components implement the Shiloach-Vishkin algorithm =-=[16, 22, 25, 15]-=- due to its simplicity and efficiency. However, these parallel implementations of the Shiloach-Vishkin algorithm do not achieve any parallel speedups over arbitrary, sparse graphs against the best seq... |

30 |
New connectivity and MSF algorithms for shuffleexchange network and PRAM
- Awerbuch, Shiloach
- 1987
(Show Context)
Citation Context ...noalias *rank, head, tail, lnth, next, tmp1, tmp2 first = 0; #pragma mta use 100 streams for (i = 1; i <= NLIST; i++) first += list[i]; first = ((NLIST * NLIST + NLIST) / 2) - first; head[0] = 0; head=-=[1]-=- = first; tail[0] = 0; tail[1] = 0; lnth[0] = 0; lnth[1] = 0; rank[0] = 0; rank[first] = 1; for (i = 2; i <= NWALK; i++) { int node = i * (NLIST / NWALK); head[i] = node; tail[i] = 0; lnth[i] = 0; ran... |

30 | A fast, parallel spanning tree algorithm for symmetric multiprocessors (smps
- BADER, CONG
(Show Context)
Citation Context ...hms and demonstrated speedups compared with the best sequential implementation for graphtheoretic problems such as ear decomposition [2], tree contraction and expression evaluation [3], spanning tree =-=[4]-=-, rooted spanning tree [13], and minimum spanning forest [5]. Many of these algorithms achieve good speedups due to algorithmic techniques for efficient design and better cache performance. For some o... |

28 |
An optimal parallel algorithm for integer sorting
- Reif
- 1985
(Show Context)
Citation Context ...arse graphs against the best sequential implementation. Greiner [16] implemented several connected components algorithms (Shiloach-Vishkin, Awerbuch-Shiloach, “randommating” based on the work of Reif =-=[33]-=- and Phillips [30], and a hybrid of the previous three) using NESL on the Cray Y-MP/C90 and TMC CM-2. On random graphs Greiner reports a maximum speedup of 3.5 using the hybrid algorithm when compared... |

25 | Parallel Implementation of Algorithms for Finding Connected Components in Graphs
- Hsu, Ramachandran, et al.
- 1997
(Show Context)
Citation Context ...IEEE parallel speedup on sparse, irregular graphs when compared against the best sequential implementation. Prior experimental studies of connected components implement the Shiloach-Vishkin algorithm =-=[16, 22, 25, 15]-=- due to its simplicity and efficiency. However, these parallel implementations of the Shiloach-Vishkin algorithm do not achieve any parallel speedups over arbitrary, sparse graphs against the best seq... |

23 | Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs
- Bader, Sreshta, et al.
- 2002
(Show Context)
Citation Context ...st parallel algorithms and demonstrated speedups compared with the best sequential implementation for graphtheoretic problems such as ear decomposition [2], tree contraction and expression evaluation =-=[3]-=-, spanning tree [4], rooted spanning tree [13], and minimum spanning forest [5]. Many of these algorithms achieve good speedups due to algorithmic techniques for efficient design and better cache perf... |

22 | Connected components on distributed memory machines
- Krishnamurthy, Lumetta, et al.
- 1994
(Show Context)
Citation Context ...IEEE parallel speedup on sparse, irregular graphs when compared against the best sequential implementation. Prior experimental studies of connected components implement the Shiloach-Vishkin algorithm =-=[16, 22, 25, 15]-=- due to its simplicity and efficiency. However, these parallel implementations of the Shiloach-Vishkin algorithm do not achieve any parallel speedups over arbitrary, sparse graphs against the best seq... |

21 | Using PRAM algorithms on a uniform-memory-access shared-memory architecture
- Bader, Illendula, et al.
- 2001
(Show Context)
Citation Context ...n of list ranking, Bader et al. have designed fast parallel algorithms and demonstrated speedups compared with the best sequential implementation for graphtheoretic problems such as ear decomposition =-=[2]-=-, tree contraction and expression evaluation [3], spanning tree [4], rooted spanning tree [13], and minimum spanning forest [5]. Many of these algorithms achieve good speedups due to algorithmic techn... |

21 | Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs
- Bader, Cong
(Show Context)
Citation Context ...ial implementation for graphtheoretic problems such as ear decomposition [2], tree contraction and expression evaluation [3], spanning tree [4], rooted spanning tree [13], and minimum spanning forest =-=[5]-=-. Many of these algorithms achieve good speedups due to algorithmic techniques for efficient design and better cache performance. For some of the instances, e.g., arbitrary, sparse graphs, while we ma... |

21 |
Finding connected components in O(log n log log n) time on the EREW PRAM
- Chong, Lam
- 1995
(Show Context)
Citation Context |

21 |
An efficient and fast parallel-connected component algorithm
- Han, Wagner
- 1990
(Show Context)
Citation Context |

20 |
Faster optimal prefix sums and list ranking
- Cole, Vishkin
- 1989
(Show Context)
Citation Context ...etworks than clusters, and direct access to all memory locations avoids the overhead of message passing. Fast parallel algorithms for graph problems have been developed for such systems. List ranking =-=[11, 31, 32, 23]-=- is a key technique often needed in efficient parallel algorithms for solving many graph-theoretic problems; for example, computing the centroid of a tree, expression evaluation, minimum spanning fore... |

20 | Designing Practical Efficient Algorithms for Symmetric Multiprocessors
- Helman, JáJá
- 1999
(Show Context)
Citation Context ... for solving many graph-theoretic problems; for example, computing the centroid of a tree, expression evaluation, minimum spanning forest, connected components, and planarity testing. Helman and JáJá =-=[19, 20]-=- present an efficient list ranking algorithm with implementation on SMP servers that achieves significant parallel speedup. Using this implementation of list ranking, Bader et al. have designed fast ... |

17 |
Parallel Implementation of Boruvka’s Minimum Spanning Tree Algorithm
- Chung, Condon
- 1996
(Show Context)
Citation Context ...o architectures, demonstrating that algorithms should be designed with the target architecture in consideration. For SMPs, we use appropriate optimizations described by Greiner [16], Chung and Condon =-=[10]-=-, Krishnamurthy et al. [25], and Hsu et al. [22]. SV is sensitive to the labeling of vertices. For the same graph, different labeling of vertices may incur different numbers of iterations to terminate... |

17 | Prefix computations on symmetric multiprocessors
- Helman, JáJá
(Show Context)
Citation Context ... for solving many graph-theoretic problems; for example, computing the centroid of a tree, expression evaluation, minimum spanning forest, connected components, and planarity testing. Helman and JáJá =-=[19, 20]-=- present an efficient list ranking algorithm with implementation on SMP servers that achieves significant parallel speedup. Using this implementation of list ranking, Bader et al. have designed fast ... |

17 |
Parallel algorithms for the connected components and minimal spanning tree problems
- Nath, Maheshwari
- 1982
(Show Context)
Citation Context |

16 | Concurrent threads and optimal parallel minimum spanning trees algorithm
- Chong, Lam, et al.
- 2001
(Show Context)
Citation Context |

15 | Connected components in O(log 3/2 n) parallel time for the CREW
- Johnson, Metaxas
- 1997
(Show Context)
Citation Context |

14 |
Approximate parallel scheduling. part II: applications to logarithmic-time optimal graph algorithms
- Cole, Vishkin
- 1991
(Show Context)
Citation Context |

12 | Connected components algorithms for mesh-connected parallel computers
- Goddard, Kumar, et al.
- 1997
(Show Context)
Citation Context |

12 | A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest
- Pettie, Ramachandran
- 1999
(Show Context)
Citation Context ...nd can be building blocks for higher-level algorithms. The research community has produced a rich collection of theoretic deterministic [28, 21, 30, 26, 8, 9, 7, 18, 24, 34, 1, 12, 14] and randomized =-=[17, 29]-=- parallel algorithms for connected components. Yet for implementations and experimental studies, although several fast PRAM algorithms exist, to our knowledge there is no parallel implementation of co... |

12 |
Parallel graph contraction
- Phillips
- 1989
(Show Context)
Citation Context |

10 |
An optimal randomised logarithmic time connectivity algorithm for the EREW
- Halperin, Zwick
- 1996
(Show Context)
Citation Context ...nd can be building blocks for higher-level algorithms. The research community has produced a rich collection of theoretic deterministic [28, 21, 30, 26, 8, 9, 7, 18, 24, 34, 1, 12, 14] and randomized =-=[17, 29]-=- parallel algorithms for connected components. Yet for implementations and experimental studies, although several fast PRAM algorithms exist, to our knowledge there is no parallel implementation of co... |

7 |
Ranking and List Scan on the Cray C-90
- List
- 1994
(Show Context)
Citation Context ...etworks than clusters, and direct access to all memory locations avoids the overhead of message passing. Fast parallel algorithms for graph problems have been developed for such systems. List ranking =-=[11, 31, 32, 23]-=- is a key technique often needed in efficient parallel algorithms for solving many graph-theoretic problems; for example, computing the centroid of a tree, expression evaluation, minimum spanning fore... |

5 | The Euler tour technique and parallel rooted spanning tree
- Cong, Bader
- 2004
(Show Context)
Citation Context ...ups compared with the best sequential implementation for graphtheoretic problems such as ear decomposition [2], tree contraction and expression evaluation [3], spanning tree [4], rooted spanning tree =-=[13]-=-, and minimum spanning forest [5]. Many of these algorithms achieve good speedups due to algorithmic techniques for efficient design and better cache performance. For some of the instances, e.g., arbi... |