#### DMCA

## Branch-avoiding graph algorithms (2015)

Venue: | Symposium on Parallelism in Algorithms and Architectures (SPAA |

Citations: | 1 - 0 self |

### Citations

10596 | Introduction to Algorithms
- Corment, Leiserson, et al.
- 2001
(Show Context)
Citation Context ...ferent graph algorithms with respect to their branching behavior: connected components, based on the classic Shiloach-Vishkin (SV) algorithm [44], and the classical form of breadth-first search (BFS) =-=[18]-=-, sometimes referred to as the “top-down” algorithm [8]. SV is a propagation-based algorithm and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both famili... |

3928 | Emergence of scaling in random networks
- Barabási, Albert
- 1999
(Show Context)
Citation Context ...ile-loop, which executes its body exactly n times Let n ≥ 0; i← 0; while i < n do // ... i, n unmodified; no early exits ... i← i + 1; of power-law degree distributions and the small-world phenomenon =-=[39, 49, 3, 34, 6, 23, 11]-=-. Our analysis below is justified in part by some of these findings, such as the existence of a large connected component [11], which has implications for how our target graph computations will behave... |

3321 |
Collective dynamics of ‘small-world’ networks
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...alytics. Such analytics include connected components itself [43, 36], as well as computing modularity [40], detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients =-=[49]-=-, computing centrality metrics (e.g., betweenness centrality [26, 10, 27], closeness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement ... |

1665 | On power-law relationships of the internet topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...ile-loop, which executes its body exactly n times Let n ≥ 0; i← 0; while i < n do // ... i, n unmodified; no early exits ... i← i + 1; of power-law degree distributions and the small-world phenomenon =-=[39, 49, 3, 34, 6, 23, 11]-=-. Our analysis below is justified in part by some of these findings, such as the existence of a large connected component [11], which has implications for how our target graph computations will behave... |

1483 |
Finding and evaluating community structure in networks.
- Newman, Girvan
- 2004
(Show Context)
Citation Context ...th-first search (BFS), in part because they are primitive building blocks of higher-level graph analytics. Such analytics include connected components itself [43, 36], as well as computing modularity =-=[40]-=-, detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26, 10, 27], closeness centrality [42]),... |

1286 |
The small world problem
- Milgram
- 1967
(Show Context)
Citation Context ...ile-loop, which executes its body exactly n times Let n ≥ 0; i← 0; while i < n do // ... i, n unmodified; no early exits ... i← i + 1; of power-law degree distributions and the small-world phenomenon =-=[39, 49, 3, 34, 6, 23, 11]-=-. Our analysis below is justified in part by some of these findings, such as the existence of a large connected component [11], which has implications for how our target graph computations will behave... |

568 |
Algorithm 97 (shortest path
- Floyd
- 1962
(Show Context)
Citation Context ...is a propagation-based algorithm and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (APSP) =-=[24, 48]-=-, betweenness centrality [26, 10], and depth-first search [30], among numerous others. We quantify the effect of branch mispredictions for SV and BFS, both analytically and empirically. In our empiric... |

548 | A faster algorithm for betweenness centrality
- Brandes
- 2001
(Show Context)
Citation Context ...and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (APSP) [24, 48], betweenness centrality =-=[26, 10]-=-, and depth-first search [30], among numerous others. We quantify the effect of branch mispredictions for SV and BFS, both analytically and empirically. In our empirical studies, we write and analyze ... |

533 |
A set of measures of centrality based on betweenness
- Freeman
- 1977
(Show Context)
Citation Context ...and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (APSP) [24, 48], betweenness centrality =-=[26, 10]-=-, and depth-first search [30], among numerous others. We quantify the effect of branch mispredictions for SV and BFS, both analytically and empirically. In our empirical studies, we write and analyze ... |

495 | Pregel: A system for large-scale graph processing
- Malewicz, Austern, et al.
- 2010
(Show Context)
Citation Context ...ness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT [1, 21], Ligra [45], Pregel =-=[35]-=-, and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28, 12, 15, 9], and massive... |

481 | A Study of Branch Prediction Strategies.
- Smith
- 1981
(Show Context)
Citation Context ... to this large body of existing work. Branch predictors. The large body of prior work on branch predictors has focused on their design and implementation in hardware; see Smith’s survey of strategies =-=[46]-=- among other seminal references [33, 50, 51, 47, 32, 20]. Little is known publicly about the actual implementation of the branch predictors in modern processors, since these are vendor-specific and pr... |

327 | Alternative implementations of two level adaptive branch prediction,"
- Yeh, Patt
- 1992
(Show Context)
Citation Context .... Branch predictors. The large body of prior work on branch predictors has focused on their design and implementation in hardware; see Smith’s survey of strategies [46] among other seminal references =-=[33, 50, 51, 47, 32, 20]-=-. Little is known publicly about the actual implementation of the branch predictors in modern processors, since these are vendor-specific and proprietary. As such, there is some ongoing empirical rese... |

305 | A theorem on Boolean matrices
- Warshall
- 1962
(Show Context)
Citation Context ...is a propagation-based algorithm and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (APSP) =-=[24, 48]-=-, betweenness centrality [26, 10], and depth-first search [30], among numerous others. We quantify the effect of branch mispredictions for SV and BFS, both analytically and empirically. In our empiric... |

271 |
Prediction Strategies and Branch Target Buffer Design
- Lee, Smith
- 1984
(Show Context)
Citation Context .... Branch predictors. The large body of prior work on branch predictors has focused on their design and implementation in hardware; see Smith’s survey of strategies [46] among other seminal references =-=[33, 50, 51, 47, 32, 20]-=-. Little is known publicly about the actual implementation of the branch predictors in modern processors, since these are vendor-specific and proprietary. As such, there is some ongoing empirical rese... |

266 | Graph evolution: Densification and shrinking diameters
- Leskovec, Kleinberg, et al.
(Show Context)
Citation Context |

186 |
Two-level adaptive training branch prediction.
- Yeh, Patt
- 1991
(Show Context)
Citation Context .... Branch predictors. The large body of prior work on branch predictors has focused on their design and implementation in hardware; see Smith’s survey of strategies [46] among other seminal references =-=[33, 50, 51, 47, 32, 20]-=-. Little is known publicly about the actual implementation of the branch predictors in modern processors, since these are vendor-specific and proprietary. As such, there is some ongoing empirical rese... |

180 |
The centrality index of a graph .
- Sabidussi
- 1966
(Show Context)
Citation Context ...y [40], detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26, 10, 27], closeness centrality =-=[42]-=-), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT [1, 21], Ligra [45], Pregel [35], and the Combin... |

139 |
Diameter of the world-wide web
- Albert, Jeong, et al.
- 1999
(Show Context)
Citation Context |

137 |
An O(1og n) parallel connectivity algorithm,”
- Shiloach, Vishkin
- 1982
(Show Context)
Citation Context ...eductions in energy-efficiency. This paper analyzes two different graph algorithms with respect to their branching behavior: connected components, based on the classic Shiloach-Vishkin (SV) algorithm =-=[44]-=-, and the classical form of breadth-first search (BFS) [18], sometimes referred to as the “top-down” algorithm [8]. SV is a propagation-based algorithm and BFS is a shortest-path algorithm. The findin... |

113 | The YAGS Branch Prediction Scheme. In
- Eden, Mudge
- 1998
(Show Context)
Citation Context |

109 |
The bi-mode branch predictor.
- Lee, Chen, et al.
- 1997
(Show Context)
Citation Context |

102 | The agree predictor: A mechanism for reducing negative branch history interference.
- Sprangle, Chappell, et al.
- 1997
(Show Context)
Citation Context |

91 |
Algorithm 447: efficient algorithms for graph manipulation.
- Hopcroft, Tarjan
- 1973
(Show Context)
Citation Context ...thm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (APSP) [24, 48], betweenness centrality [26, 10], and depth-first search =-=[30]-=-, among numerous others. We quantify the effect of branch mispredictions for SV and BFS, both analytically and empirically. In our empirical studies, we write and analyze highly-tuned assembly languag... |

81 | The parallel BGL: A generic library for distributed graph computations
- Gregor, Lumsdaine
- 2005
(Show Context)
Citation Context ...[1, 21], Ligra [45], Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism =-=[52, 28, 12, 15, 9]-=-, and massively multithreaded systems [5, 7]. Thus, our study of low-level single-core behavior and instructionlevel parallelism complements and should apply broadly to this large body of existing wor... |

79 |
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2”,
- Bader
- 2006
(Show Context)
Citation Context ...LAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28, 12, 15, 9], and massively multithreaded systems =-=[5, 7]-=-. Thus, our study of low-level single-core behavior and instructionlevel parallelism complements and should apply broadly to this large body of existing work. Branch predictors. The large body of prio... |

64 | Scalable GPU Graph Traversal,
- Merrill, Garland, et al.
- 2012
(Show Context)
Citation Context .... present a shared-memory parallel BFS [16]. They focus on reducing cross-socket communication. Their implementation is lock-free. Merrill and Garland have developed a highly-tuned GPU implementation =-=[37]-=-. Beamer et al. have proposed algorithmic changes, which they refer to as being direction- optimizing [8]. Though there are many interesting ideas in this body of work, we are not aware of a detailed ... |

58 | The Combinatorial BLAS: Design, implementation, and applications.
- Buluç, Gilbert
- 2011
(Show Context)
Citation Context ... computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT [1, 21], Ligra [45], Pregel [35], and the Combinatorial BLAS =-=[13]-=-. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28, 12, 15, 9], and massively multithreaded systems [5, 7]. ... |

58 | A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L.
- Yoo, Chow, et al.
- 2005
(Show Context)
Citation Context ...[1, 21], Ligra [45], Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism =-=[52, 28, 12, 15, 9]-=-, and massively multithreaded systems [5, 7]. Thus, our study of low-level single-core behavior and instructionlevel parallelism complements and should apply broadly to this large body of existing wor... |

51 |
Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0.
- Karypis, Kumar
- 1995
(Show Context)
Citation Context ...building blocks of higher-level graph analytics. Such analytics include connected components itself [43, 36], as well as computing modularity [40], detecting communities [40, 41], partitioning graphs =-=[31]-=-, computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26, 10, 27], closeness centrality [42]), as well as computing a wide variety of distance based ana... |

44 |
Green-Marl: A DSL for easy and efficient graph analysis.
- Hong, Chafi, et al.
- 2012
(Show Context)
Citation Context ...formance engineering of graph computations. There is some work on low-level performance engineering of graph computations. Green-Marl is domain specific language, which targets shared-memory platforms=-=[29]-=-. It emits backend code that manages shared variables using, for instance, atomic instructions; from published code samples, its implementations are branch-based. Cong and Makarychev present cache-fri... |

40 | Ligra: A lightweight graph processing framework for shared memory.
- Shun, Blelloch
- 2013
(Show Context)
Citation Context ...0, 27], closeness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT [1, 21], Ligra =-=[45]-=-, Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28, 12, 15, 9]... |

37 |
An on-line edge-deletion problem,”
- Even, Shiloach
- 1981
(Show Context)
Citation Context ...ses on connected components (CC) and breadth-first search (BFS), in part because they are primitive building blocks of higher-level graph analytics. Such analytics include connected components itself =-=[43, 36]-=-, as well as computing modularity [40], detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26... |

35 | Direction-Optimizing Breadth-First Search. In
- Beamer, Asanovic, et al.
- 2012
(Show Context)
Citation Context ...behavior: connected components, based on the classic Shiloach-Vishkin (SV) algorithm [44], and the classical form of breadth-first search (BFS) [18], sometimes referred to as the “top-down” algorithm =-=[8]-=-. SV is a propagation-based algorithm and BFS is a shortest-path algorithm. The findings of our paper can in principle be extended to both families of algorithms, including All-Pairs Shortest-Paths (A... |

33 | K.: Parallel breadth-first search on distributed memory systems
- Buluç, Madduri
(Show Context)
Citation Context ...[1, 21], Ligra [45], Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism =-=[52, 28, 12, 15, 9]-=-, and massively multithreaded systems [5, 7]. Thus, our study of low-level single-core behavior and instructionlevel parallelism complements and should apply broadly to this large body of existing wor... |

30 |
A quantitative study of irregular programs on GPUs,” in
- Burtscher, Nasre, et al.
- 2012
(Show Context)
Citation Context ... has implications for how our target graph computations will behave. At a lower-level, Burtscher et al. develop metrics to quantify irregularity, with respect to both memory accesses and control-flow =-=[14]-=-. They use these metrics to compare different computations, including graph computations, confirming some aspects of conventional wisdom about what we consider “regular” versus “irregular.” However, i... |

21 |
The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/ microarchitecture.pdf,
- Fog
- 2014
(Show Context)
Citation Context ...ctors in modern processors, since these are vendor-specific and proprietary. As such, there is some ongoing empirical research that tries to demystify these implementations using synthetic benchmarks =-=[38, 25]-=-. However, with few exceptions, most of the other work on branch prediction evaluates against a general benchmark suites, such as SPECint2006 and SPECfp2006 benchmarks. Therefore, they do not provide ... |

19 |
A fast algorithm for streaming betweenness centrality.
- Green, McColl, et al.
- 2012
(Show Context)
Citation Context ...36], as well as computing modularity [40], detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality =-=[26, 10, 27]-=-, closeness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT [1, 21], Ligra [45], ... |

16 |
Fast and efficient graph traversal algorithm for CPUs: Maximizing single-node efficiency
- Chhugani, Satish, et al.
- 2012
(Show Context)
Citation Context ...ara2. Both systems support multiple threads per core, which can help in memory latency hiding. For BFS specifically, there are additional studies. Chhugani et el. present a shared-memory parallel BFS =-=[16]-=-. They focus on reducing cross-socket communication. Their implementation is lock-free. Merrill and Garland have developed a highly-tuned GPU implementation [37]. Beamer et al. have proposed algorithm... |

14 | Parallel community detection for massive graphs
- Riedy, Meyerhenke, et al.
- 2012
(Show Context)
Citation Context ...rt because they are primitive building blocks of higher-level graph analytics. Such analytics include connected components itself [43, 36], as well as computing modularity [40], detecting communities =-=[40, 41]-=-, partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26, 10, 27], closeness centrality [42]), as well as computing a wide var... |

10 |
Implementing a portable multi-threaded graph library: The mtgl on qthreads,”
- Barrett, Berry, et al.
- 2009
(Show Context)
Citation Context ...LAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28, 12, 15, 9], and massively multithreaded systems =-=[5, 7]-=-. Thus, our study of low-level single-core behavior and instructionlevel parallelism complements and should apply broadly to this large body of existing work. Branch predictors. The large body of prio... |

9 |
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
- Checconi, Petrini, et al.
- 2012
(Show Context)
Citation Context |

8 | Stinger: High performance data structure for streaming graphs.
- Ediger, McColl, et al.
- 2012
(Show Context)
Citation Context ...., betweenness centrality [26, 10, 27], closeness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER =-=[4, 22]-=-, GraphCT [1, 21], Ligra [45], Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory pa... |

7 |
Optimizing large-scale graph analysis on a multi-threaded, multi-core platform.
- Cong, Makarychev
- 2011
(Show Context)
Citation Context ...d variables using, for instance, atomic instructions; from published code samples, its implementations are branch-based. Cong and Makarychev present cache-friendly implementations of graph algorithms =-=[17]-=-. They quantify how software prefetching improves spatial locality on both the Power7 and the Sun Niagara2. Both systems support multiple threads per core, which can help in memory latency hiding. For... |

6 | Distributed memory breadth-first search revisited: Enabling bottom-up search
- Beamer, Buluç, et al.
- 2013
(Show Context)
Citation Context |

6 | Demystifying Intel branch predictors
- Milenkovic, Milenkovic, et al.
- 2002
(Show Context)
Citation Context ...ctors in modern processors, since these are vendor-specific and proprietary. As such, there is some ongoing empirical research that tries to demystify these implementations using synthetic benchmarks =-=[38, 25]-=-. However, with few exceptions, most of the other work on branch prediction evaluates against a general benchmark suites, such as SPECint2006 and SPECfp2006 benchmarks. Therefore, they do not provide ... |

5 |
Graphct: Multithreaded algorithms for massive graph analysis. Parallel and Distributed Systems,
- Ediger, Jiang, et al.
- 2013
(Show Context)
Citation Context ...ntrality [26, 10, 27], closeness centrality [42]), as well as computing a wide variety of distance based analytics. A variety of packages implement these analytics, including STINGER [4, 22], GraphCT =-=[1, 21]-=-, Ligra [45], Pregel [35], and the Combinatorial BLAS [13]. However, the focus of these packages is on exploiting higher-level shared memory multicore, manycore, distributed memory parallelism [52, 28... |

1 |
PeachPy: A python framework for developing high-performance assembly kernels
- Dukhan
(Show Context)
Citation Context ...e big performance penalty on Cortex-A15. For better control of generated code we implemented both connected components and breadth-first search algorithms in x86-64 and ARM assembly using the PeachPy =-=[19]-=- framework. We performed our experiments on 7 systems with different microarchitectures, these are presented in Table 1. On all systems the assembly implementations performed at least as well as C imp... |

1 |
Parallel streaming connected components using “parent-neighbor” subgraphs
- McColl, Green, et al.
- 2013
(Show Context)
Citation Context ...ses on connected components (CC) and breadth-first search (BFS), in part because they are primitive building blocks of higher-level graph analytics. Such analytics include connected components itself =-=[43, 36]-=-, as well as computing modularity [40], detecting communities [40, 41], partitioning graphs [31], computing clustering coefficients [49], computing centrality metrics (e.g., betweenness centrality [26... |