## Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations (1998)

Citations: | 14 - 1 self |

### BibTeX

@TECHREPORT{Chamberlain98graphpartitioning,

author = {Bradford L. Chamberlain},

title = {Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations},

institution = {},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper surveys graph partitioning algorithms used for parallel computing, with an emphasis on the problem of distributing workloads for parallel computations. Geometric, structural, and refinementbased algorithms are described and contrasted. In addition, multilevel partitioning techniques and issues related to parallel partitioning are addressed. All algorithms are evaluated qualitatively in terms of their execution speed and ability to generate partitions with small separators. 1 Introduction In its most general form, the graph partitioning problem asks how best to divide a graph's vertices into a specified number of subsets such that: (i) the number of vertices per subset is equal and (ii) the number of edges straddling the subsets is minimized. Graph partitioning has several important applications in Computer Science, including VLSI circuit layout [8], image processing [43], solving sparse linear systems, computing fill-reducing orderings for sparse matrices, and distribu...

### Citations

2590 | Normalized cuts and image segmentation
- Shi, Malik
- 1997
(Show Context)
Citation Context ... equal and (ii) the number of edges straddling the subsets is minimized. Graph partitioning has several important applications in Computer Science, including VLSI circuit layout [8], image processing =-=[43]-=-, solving sparse linear systems, computing fill-reducing orderings for sparse matrices, and distributing workloads for parallel computation. Unfortunately, graph partitioning is an NP-hard problem [13... |

1046 |
An efficient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ... the late 60's, the algorithm strives to improve an initial (possibly random) partition of the graph by trading vertices from one subset to the other with the goal of reducing the number of cut edges =-=[31]-=-. This general approach of refining an existing solution can be considered a class of algorithms unto itself. The Kernighan-Lin (KL) algorithm is based on the notion of gain---a metric for quantifying... |

794 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...peated for S i+1 , starting at the boundary vertex of S i with the smallest number of unexplored edges. One other related approach is the greedy graph growing algorithm developed by Karypis and Kumar =-=[25]-=-. This is another algorithm for bisection that grows a subset of vertices around an arbitrary root. However, rather than walking the graph in a strict breadth-first manner, it adds vertices to the sub... |

764 |
The Symmetric Eigenvalue Problem
- Parlett
- 1980
(Show Context)
Citation Context ...rtitioning. Hence, x 2 is traditionally referred to as the Fielder vector. The recursive spectral bisection algorithm proceeds as follows: Compute the Fielder vector for G using the Lanczos algorithm =-=[37]-=-, modified to avoid computing the non-Fielder eigenvectors. Next, determine the median value of the Fielder vector's components and use this to partition vertices of G into two subsets: those whose co... |

510 |
Optimization by simulated annealing: an experimental evaluation
- Johnson, Aragon, et al.
- 1989
(Show Context)
Citation Context ...ften referred to interchangeably. Although KL/FM can be used to refine random partitions, there is significant evidence to indicate that they work best when given a reasonably good starting partition =-=[24, 39]-=-. For this reason, KL/FM is often used as a local postprocessing step to improve a partition computed by a more globally-oriented algorithm [4, 22, 25]. Many uses of KL/FM modify the base algorithm to... |

500 |
Computer Solution of Large Sparse Positive Definite Systems
- George, Liu
- 1981
(Show Context)
Citation Context ...s connectivity. This section describes a number of structural approaches. 4.1 Graph-Walking Algorithms Recursive level-structure bisection is a combinatorial approach that is very intuitive in nature =-=[15]-=-. It is similar to the coordinate bisection algorithms described in the previous section, but defines the distance between two vertices as the length of their shortest connecting path, rather than the... |

486 |
Paritioning sparse matrices with eigenvectors of graphs
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ...at a time. Pothen, Simon, and Liou introduce a spectral graph partitioning algorithm that addresses these shortcoming by considering a graph's global connectivity properties when computing a solution =-=[40]-=-. This technique is referred to as recursive spectral bisection (RSB). Recursive spectral bisection utilizes the Laplacian matrix, L, of the input graph---a jV j \Theta jV j matrix that encodes inform... |

478 | Multilevel k-way Partitioning Scheme for Irregular Graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...ime of multilevel recursive bisection by developing a multilevel p-way partitioning algorithm in which coarsening and refinement are performed a single time rather than at every step of the bisection =-=[26]-=-. 22 7 Parallel Techniques Since the graph partitioning algorithms described in this paper are used to distribute computations across a processor set, it seems only natural to take advantage of that p... |

443 |
A multilevel algorithm for partitioning graphs
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...est when given a reasonably good starting partition [24, 39]. For this reason, KL/FM is often used as a local postprocessing step to improve a partition computed by a more globally-oriented algorithm =-=[4, 22, 25]-=-. Many uses of KL/FM modify the base algorithm to suit the programmer's specific needs [22, 25]. For instance, an implementation may only consider vertices that lie on the partition boundary since the... |

427 |
Mattheyses, A linear-time heuristic for improving network partitions
- Fiduccia, M
- 1982
(Show Context)
Citation Context ...en be repeated using this new partition as a starting point. Fiduccia and Mattheyses make some improvements to the base KL algorithm that utilize better data structures to improve the overall runtime =-=[10]-=-. For instance, the Fiduccia-Mattheyses (FM) algorithm minimizes the number of vertices whose gains need to be adjusted when a vertex is moved. These refinements are considered so fundamental to the b... |

376 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...43], solving sparse linear systems, computing fill-reducing orderings for sparse matrices, and distributing workloads for parallel computation. Unfortunately, graph partitioning is an NP-hard problem =-=[13]-=-, and therefore all known algorithms for generating partitions merely return approximations to the optimal solution. In spite of this theoretical limitation, numerous algorithms for graph partitioning... |

367 | A Simple Parallel Algorithm for the Maximal Independent Set Problem
- Luby
- 1986
(Show Context)
Citation Context ...is/Kumar to parallelize the coarsening stages of their respective multilevel algorithms [1, 28]. Both approaches use a clever parallel algorithm for maximal independent set creation developed by Luby =-=[32]-=-. Luby's algorithm assigns a random number to every vertex in the graph. Each vertex then checks its value against those of its neighbors, and if it has the smallest value, it includes itself in the m... |

289 |
Partitioning of unstructured problems for parallel processing
- Simon
- 1991
(Show Context)
Citation Context ...ighboring triangles. (b) The mesh's dual graph, used to partition the computation. Each vertex represents a triangle's data. Edges connect triangles that need to refer to each others' values (source: =-=[44]-=-). angles. Thus, the workload induced by this mesh can be represented using a graph in which vertices represent mesh triangles and edges connect nodes whose triangles share a common edge (Figure 2(b))... |

277 | A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience
- Barnard, Simon
- 1994
(Show Context)
Citation Context ... time required by the algorithm is significant enough to be a serious impediment to its practical use [25, 16, 46]. Since its original formulation, much work has been done to accelerate the algorithm =-=[41, 2, 1]-=-, some of which will be described in the following sections. One noteworthy extension to RSB is an implementation by Hendrickson and Leland [21] that uses additional eigenvectors to obtain simultaneou... |

274 | MeTiS A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...eral partitioning packages are available online, which implement many of the algorithms described in this paper. The leading contenders seem to be the METIS and ParMETIS packages by Karypis and Kumar =-=[29, 30]-=- and the Jostle package by Walshawset al. [47]. METIS and ParMETIS implement Karypis and Kumar's multilevel, p-way multilevel, and parallel p-way multilevel algorithms. Jostle also supports parallel p... |

189 |
The Chaco user’s guide: Version 2.0
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...nt Karypis and Kumar's multilevel, p-way multilevel, and parallel p-way multilevel algorithms. Jostle also supports parallel partitioning, with Walshaw's emphasis on minimizing vertex movement. Chaco =-=[20]-=- is Hendrickson and Leland's package that contains implementations of their 4-way/8-way spectral algorithms, multilevel algorithms, and refinements to KL/FM. Other online packages worth investigating ... |

179 |
R.: An improved spectral graph partitioning algorithm for mapping parallel computations
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...ch work has been done to accelerate the algorithm [41, 2, 1], some of which will be described in the following sections. One noteworthy extension to RSB is an implementation by Hendrickson and Leland =-=[21]-=- that uses additional eigenvectors to obtain simultaneous quadrisections and octasections of a graph. This often results in smaller partitions than those obtained by recursive calls to RSB, yet requir... |

178 |
Nested dissection of a regular finite element mesh
- George
- 1973
(Show Context)
Citation Context ...o disjoint subsets. A related problem tries to break the graph into subsets using a vertex separator V s 2 V of minimum size (Figure 3(c)). Vertex separators are used for performing nested dissection =-=[14]-=-, a technique useful for reordering a matrix's rows and columns to benefit its parallel factorization. Since workload graphs use vertices to represent data, vertex separators are inappropriate since t... |

157 | Performance of dynamic load balancing algorithms for unstructured mesh calculations,” Concurrency: Practice and Experience
- Williams
- 1991
(Show Context)
Citation Context ... least. Studies comparing it with coordinate bisection, KL/FM, and spectral algorithms show that simulated annealing takes a significant amount of time to generate partitions of merely modest quality =-=[24, 48]-=-. Given the number of parameters that need to be set up and tuned for a simulated annealing run, not to mention 17 coarsen coarsen refine propagate refine coarsen propagate refine propagate partition ... |

148 | The Laplacian spectrum of graphs
- Mohar
- 1988
(Show Context)
Citation Context ...tor ~ x 1 of all ones. In his studies of the Laplacian matrix, Mohar determined that for connected graphs, the magnitude of the second smallest eigenvalue,s2 , serves as a measure of G's connectivity =-=[36]-=-. Moreover, the magnitude of the second eigenvector's i th element gives a rough indication of vertex i's distance from other vertices in G: the closer two values are numerically, the shorter the conn... |

116 |
ParMetis: Parallel Graph Partitioning and Sparse Matrix Ordering
- Karypis, Kumar
- 2003
(Show Context)
Citation Context ...eral partitioning packages are available online, which implement many of the algorithms described in this paper. The leading contenders seem to be the METIS and ParMETIS packages by Karypis and Kumar =-=[29, 30]-=- and the Jostle package by Walshawset al. [47]. METIS and ParMETIS implement Karypis and Kumar's multilevel, p-way multilevel, and parallel p-way multilevel algorithms. Jostle also supports parallel p... |

102 | Geometric mesh partitioning: Implementation and experiments
- Gilbert, Miller, et al.
(Show Context)
Citation Context ...the more sophisticated techniques described in the sections that follow indicate that while hyperplane bisectors are fast to compute, they generally result in partitions of considerably worse quality =-=[44, 16, 5]-=-. 9 3.2 Circle Bisection Miller et al. describe an algorithm called recursive circle bisection that addresses the drawbacks of hyperplanebased algorithms [33]. It uses global information about the gra... |

97 |
A Procedure for Placement of Standardçell VLSI Circuits
- Dunlop, Kernighan
- 1985
(Show Context)
Citation Context ...vertices per subset is equal and (ii) the number of edges straddling the subsets is minimized. Graph partitioning has several important applications in Computer Science, including VLSI circuit layout =-=[8]-=-, image processing [43], solving sparse linear systems, computing fill-reducing orderings for sparse matrices, and distributing workloads for parallel computation. Unfortunately, graph partitioning is... |

92 |
An unified geometric approach to graph separators
- Miller, Teng, et al.
- 1991
(Show Context)
Citation Context ...peed, running approximately an order of magnitude faster than the spectral algorithm. Circle bisection is built upon a body of theoretical work which characterizes graphs that have good 12 separators =-=[35]-=- and which places theoretical lower bounds on geometric separator sizes [5, 34]. This work serves as a strong foundation for explaining why circle bisection tends to result in good separators. 4 Struc... |

91 | Analysis of multilevel graph partitioning
- Karypis, Kumar
- 1995
(Show Context)
Citation Context ... has been vastly simplified. Karypis and Kumar argue that a good bisection of a coarse graph constructed using this method can only be worse than a good bisection of the finer graph by a small factor =-=[27]-=-. To try and minimize this factor, Hendrickson and Leland use a modified version of KL/FM to refine the partition at every third step of the interpolation. Their experiments demonstrate that the multi... |

86 | How good is recursive bisection
- Simon, Teng
- 1997
(Show Context)
Citation Context ...nsidering the resulting subgraphs (Figure 4). It has been shown that even if recursive bisection is performed using an optimal bisection algorithm, it can still result in a suboptimal p-way partition =-=[45]-=-. In spite of this theoretical limitation, recursive bisection remains the primary graph partitioning strategy due to its simplicity compared to computing p-way partitions directly. The majority of th... |

76 |
A simple and efficient automatic FEM domain decomposer
- Farhat
- 1988
(Show Context)
Citation Context ... level-structure bisection. Although this algorithm is relatively straightforward and fast, it tends to result in relatively poor partitions [44]. A very similar approach is Farhat's greedy algorithm =-=[9]-=-. It also accumulates sets of vertices by 13 traversing the graph in a breadth-first manner, but differs in that it computes its p subsets directly, without resorting to recursive bisection. The subse... |

75 |
Automatic Mesh Partitioning
- Miller, Teng, et al.
- 1993
(Show Context)
Citation Context ...ions of considerably worse quality [44, 16, 5]. 9 3.2 Circle Bisection Miller et al. describe an algorithm called recursive circle bisection that addresses the drawbacks of hyperplanebased algorithms =-=[33]-=-. It uses global information about the graph's vertices to compute separators using circles and spheres rather than lines and planes. Since circle bisectors add an additional degree of freedom to the ... |

59 |
A heuristic for reducing fill-in in sparse matrix factorization
- Bui, Jones
- 1993
(Show Context)
Citation Context ...est when given a reasonably good starting partition [24, 39]. For this reason, KL/FM is often used as a local postprocessing step to improve a partition computed by a more globally-oriented algorithm =-=[4, 22, 25]-=-. Many uses of KL/FM modify the base algorithm to suit the programmer's specific needs [22, 25]. For instance, an implementation may only consider vertices that lie on the partition boundary since the... |

53 | Approximating center points with iterative radon points
- Clarkson, Eppstein, et al.
- 1996
(Show Context)
Citation Context ...olynomial time, it uses a large number of constraints, causing it to be too slow for practical use. Instead, the authors use a fast algorithm for computing approximate centerpoints using radon points =-=[6]-=-. Applying this algorithm to the sampled vertex set results in a vastly accelerated computation that produces good centerpoints in practice. The third difference comes in the selection of a partition.... |

43 | Parallel algorithms for dynamically partitioning unstructured grids
- Diniz, Plimpton, et al.
- 1995
(Show Context)
Citation Context ...tes over a fraction of V , using occasional global communications to compare notes. For example, Diniz et al. found the inertial bisection algorithm to be fast and trivially implementable in parallel =-=[7]-=-. Similarly, Miller et al. predict that their circle bisection algorithm will be reasonably efficient in parallel [16]. Unfortunately, the only published results that describe a parallel implementatio... |

41 | Graph partitioning and parallel solvers: Has the emperor no clothes? In IRREGULAR’98: solving irregularly structured problems
- Hendrickson
- 1998
(Show Context)
Citation Context ...ithm could only improve its solutions. Hendrickson addresses some of these issues as well as several others in a recent 29 paper that casts doubt on whether traditional graph partitioning is adequate =-=[18]-=-. Future work would be well-served by continuing to think about real-world issues while solving abstract problems. The work of Walshaw et al. [46] provides an excellent example of this by repartitioni... |

40 | Graph Partitioning Algorithms with Applications to Scientific Computing
- Pothen
(Show Context)
Citation Context ...ften referred to interchangeably. Although KL/FM can be used to refine random partitions, there is significant evidence to indicate that they work best when given a reasonably good starting partition =-=[24, 39]-=-. For this reason, KL/FM is often used as a local postprocessing step to improve a partition computed by a more globally-oriented algorithm [4, 22, 25]. Many uses of KL/FM modify the base algorithm to... |

36 | PMRSB: Parallel multilevel recursive spectral bisection
- Barnard
- 1995
(Show Context)
Citation Context ... time required by the algorithm is significant enough to be a serious impediment to its practical use [25, 16, 46]. Since its original formulation, much work has been done to accelerate the algorithm =-=[41, 2, 1]-=-, some of which will be described in the following sections. One noteworthy extension to RSB is an implementation by Hendrickson and Leland [21] that uses additional eigenvectors to obtain simultaneou... |

35 |
Towards a fast implementation of spectral nested dissection
- Pothen, Simon, et al.
- 1992
(Show Context)
Citation Context ... time required by the algorithm is significant enough to be a serious impediment to its practical use [25, 16, 46]. Since its original formulation, much work has been done to accelerate the algorithm =-=[41, 2, 1]-=-, some of which will be described in the following sections. One noteworthy extension to RSB is an implementation by Hendrickson and Leland [21] that uses additional eigenvectors to obtain simultaneou... |

22 |
The PARTY partitioning—library user guide—version 1.1
- Preis, Diekmann
- 1996
(Show Context)
Citation Context ...'s package that contains implementations of their 4-way/8-way spectral algorithms, multilevel algorithms, and refinements to KL/FM. Other online packages worth investigating are Scotch [38] and Party =-=[42]-=-. Acknowledgements The author would like to thank Jim Fix for strategizing sessions and Wayne Wong for his editorial feedback. Additional thanks to Mike, Susannah, and Mom for providing encouragement ... |

19 |
Geometric separators for finite element meshes
- Miller, Teng, et al.
- 1995
(Show Context)
Citation Context ...lgorithm. Circle bisection is built upon a body of theoretical work which characterizes graphs that have good 12 separators [35] and which places theoretical lower bounds on geometric separator sizes =-=[5, 34]-=-. This work serves as a strong foundation for explaining why circle bisection tends to result in good separators. 4 Structural Algorithms The geometric algorithms of the previous section all share two... |

19 |
Mesh partitioning and load-balancing for distributed memory parallel systems
- Walshaw, Cross, et al.
(Show Context)
Citation Context ...mputation enormously. Although partitions generated by RSB are typically of very high quality, the time required by the algorithm is significant enough to be a serious impediment to its practical use =-=[25, 16, 46]-=-. Since its original formulation, much work has been done to accelerate the algorithm [41, 2, 1], some of which will be described in the following sections. One noteworthy extension to RSB is an imple... |

15 |
An improved spectral load balancing method
- Hendrickson, Leyland
- 1993
(Show Context)
Citation Context ...values in the interval [\Gamma p n; p n] rather than just 1 and \Gamma1. Moreover, a minimum solution to the continuous problem is formed by the p n-length second eigenvectors of the Laplacian matrix =-=[19]. This rai-=-ses an obvious question: will a solution to the continuous problem have any bearing on the discrete problem? Fortunately, the answer turns out to be "yes." The construction of the Laplacian ... |

13 | Scotch 3.1 user’s guide
- Pellegrini
(Show Context)
Citation Context ...kson and Leland's package that contains implementations of their 4-way/8-way spectral algorithms, multilevel algorithms, and refinements to KL/FM. Other online packages worth investigating are Scotch =-=[38]-=- and Party [42]. Acknowledgements The author would like to thank Jim Fix for strategizing sessions and Wayne Wong for his editorial feedback. Additional thanks to Mike, Susannah, and Mom for providing... |

9 | and Padma Raghavan. A Cartesian parallel nested dissection algorithm - Heath - 1995 |

8 | Partitioning meshes with lines and planes
- Cao, Gilbert, et al.
- 1996
(Show Context)
Citation Context ...ursively using the same technique. Partitions generated by coordinate bisection are illustrated in Color Plate 1(a) and (b). Variations on coordinate bisection have been considered by several authors =-=[17, 5]-=-. One important improvement to coordinate bisection, recursive inertial bisection, does a better job of getting an overall gestalt for the graph by computing its principal axis of inertia, I . By defi... |

5 |
Algebraic connectivity of graphs
- Fielder
(Show Context)
Citation Context ...oser two values are numerically, the shorter the connecting path between their corresponding vertices. These special properties of the second eigenvector ~ x 2 were thoroughly investigated by Fielder =-=[11, 12]-=-, whose work provided the theoretical justification for its use in graph partitioning. Hence, x 2 is traditionally referred to as the Fielder vector. The recursive spectral bisection algorithm proceed... |

5 | A Data-Parallel Implementation of the Geometric Partitioning Algorithm
- Hu, Teng, et al.
- 1997
(Show Context)
Citation Context ...onably efficient in parallel [16]. Unfortunately, the only published results that describe a parallel implementation of circle bisection give little indication of the algorithm's parallel performance =-=[23]-=-. 7.2 Challenges to Parallelism By way of contrast, edge-based algorithms tend to be difficult to parallelize. For example, KL/FM and the coarsening algorithms of multilevel techniques require vertice... |

4 |
The Jostle user manual: version 2.0
- Walshaw
- 1997
(Show Context)
Citation Context ...hich implement many of the algorithms described in this paper. The leading contenders seem to be the METIS and ParMETIS packages by Karypis and Kumar [29, 30] and the Jostle package by Walshawset al. =-=[47]-=-. METIS and ParMETIS implement Karypis and Kumar's multilevel, p-way multilevel, and parallel p-way multilevel algorithms. Jostle also supports parallel partitioning, with Walshaw's emphasis on minimi... |

2 |
A property of eigenvectors of non-negative symmetric matrices and its application to graph theory
- Fielder
- 1975
(Show Context)
Citation Context ...oser two values are numerically, the shorter the connecting path between their corresponding vertices. These special properties of the second eigenvector ~ x 2 were thoroughly investigated by Fielder =-=[11, 12]-=-, whose work provided the theoretical justification for its use in graph partitioning. Hence, x 2 is traditionally referred to as the Fielder vector. The recursive spectral bisection algorithm proceed... |

1 |
DSMC simulations of low-density fluid flow on MIMD supercomputers
- Bartel, Plimpton
- 1992
(Show Context)
Citation Context ...gion. Other computations involve dynamic shifts in the amount of work at each vertex. For example, in particle simulations, the number of particles per vertex can fluctuate during the course of a run =-=[3]-=-. These runtime changes in the computation's workload result in the need for dynamic load balancing. Given the choice, it would be preferable to compute a repartitioning in-place rather than to ship t... |