## Multilevel Mesh Partitioning for Heterogeneous Communication Networks (2001)

### Cached

### Download Links

- [www.gre.ac.uk]
- [staffweb.cms.gre.ac.uk]
- [staffweb.cms.gre.ac.uk]
- [staffweb.cms.gre.ac.uk]
- [www.gre.ac.uk]
- [www.gre.ac.uk]
- [staffweb.cms.gre.ac.uk]
- DBLP

### Other Repositories/Bibliography

Venue: | Future Generation Comput. Syst |

Citations: | 25 - 9 self |

### BibTeX

@ARTICLE{Walshaw01multilevelmesh,

author = {C. Walshaw and M. Cross},

title = {Multilevel Mesh Partitioning for Heterogeneous Communication Networks},

journal = {Future Generation Comput. Syst},

year = {2001},

volume = {17},

pages = {601--623}

}

### Years of Citing Articles

### OpenURL

### Abstract

Multilevel algorithms are a successful class of optimisation techniques which address the mesh partitioning problem for distributing unstructured meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimisation method which refines the partition at each graph level. To date these algorithms have been used almost exclusively to minimise the cut edge weight in the graph with the aim of minimising the parallel communication overhead, but recently there has been a perceived need to take into account the communications network of the parallel machine. For example the increasing use of SMP clusters (systems of multiprocessor compute nodes with very fast intra-node communications but relatively slow inter-node networks) suggest the use of hierarchical network models. Indeed this requirement is exacerbated in the early experiments with meta-computers (multiple supercomputers combined together, in extreme cases over inter-continental networks). In this paper therefore, we modify a multilevel algorithm in order to minimise a cost function based on a model of the communications network. Several network models and variants of the algorithm are tested and we establish that it is possible to successfully guide the optimisation to reflect the chosen architecture. 2001 Elsevier Science B.V. All rights reserved.

### Citations

851 | A fast and high quality multilevel scheme for partitioning irregular graphs. SIAAA Sci Computing
- Karypis, Kumar
(Show Context)
Citation Context ...Hendrickson and Leland [13] and Bui and Jones [2], who generalised it to encompass local refinement algorithms. Several algorithms for carrying out the matching have been devised by Karypis and Kumar =-=[16]-=-, while Walshaw and Cross [25] describe a method for utilising imbalance in the coarsest graphs to enhance the final partition quality. Graph contraction: To create a coarser graph Gl+1(Vl+1,El+1) fro... |

460 |
A multilevel algorithm for partitioning graphs, in: Supercomputing '95
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...uch attention has been focused on developing suitable heuristics, and some powerful methods, many based on a graph corresponding to the communication requirements of the mesh, have been devised, e.g. =-=[13]-=-. A particularly popular and successful class of algorithms which address this mesh partitioning problem are known as multilevel algorithms. They usually combine a graph contraction algorithm which cr... |

320 |
Mattheyses, “A Linear Time Heuristic for Improving Network Partitions”, DAC
- Fiduccia, M
- 1982
(Show Context)
Citation Context ... algorithm which includes a hill-climbing mechanism to enable it to escape from local minima. Our implementation uses bucket sorting, the linear time complexity improvement of Fiduccia and Mattheyses =-=[9]-=-, and the buckets are accessed via a tree structure, which we refer to as a bucket tree. The algorithm is a partition optimisation formulation; in other words it optimises a partition of P subdomains ... |

282 | A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems
- Bernard, Simon
- 1993
(Show Context)
Citation Context ...Kernighan–Lin (KL) [17], and other optimisation algorithms. The multilevel idea was first proposed bysC. Walshaw, M. Cross / Future Generation Computer Systems 17 (2001) 601–623 607 Barnard and Si=-=mon [1]-=- as a method of speeding up spectral bisection and improved by both Hendrickson and Leland [13] and Bui and Jones [2], who generalised it to encompass local refinement algorithms. Several algorithms f... |

172 | Qaplib: a quadratic assignment problem library
- Burkard, Karisch, et al.
- 1991
(Show Context)
Citation Context ...in the more general case there is also a linear term) of a well-known optimisation problem, the quadratic assignment problem (QAP) [3]. This has been extensively studied since 1957 and is NP-complete =-=[4]-=-. There are many heuristic algorithms which address the problem, some of which are available in a software library, QAPLIB. 2 For the results in this paper we use 2 Available from http://www.imm.dtu.d... |

112 | The Quadratic Assignment Problem
- Burkard, Cela, et al.
- 1998
(Show Context)
Citation Context ...cost matrix (see x1.4). In fact this expression is a simplification (in the more general case there is also a linear term) of a well known optimisation problem the quadratic assignment problem (QAP), =-=[3]-=-. This has been extensively studied since 1957 and is NP-complete, [4]. There are many heuristic algorithms which address the problem, some of which are available in a software library, QAPLIB 2 . For... |

95 | Distributed computing in a heterogeneous computing environment
- Gabriel, Resch, et al.
- 1998
(Show Context)
Citation Context ... a meta-computer is illustrated in Fig. 1(d). Such machines are not physically assembled as such but consist of two or more compute nodes (typically supercomputers) connected together. For example in =-=[10]-=- experimentation was carried out on a meta-computer consisting of a Cray T3E in Stuttgart, Germany, connected to a Cray T3E in Pittsburgh, USA. In this respect they are more extreme examples of networ... |

78 | Graph partitioning models for parallel computing, Parallel Computing 26
- Hendrickson, Leland
- 2000
(Show Context)
Citation Context ...ns volume in the underlying solver. This is an important goal in any parallel application in order to minimise the communications overhead, however, this edge cut model, in itself somewhat inadequate =-=[12]-=-, assumes a flat or homogeneous communications network. In fact the trend for connecting together multi-processor machines results in architectures which exhibit significant network heterogeneities. F... |

72 |
A thermodynamically motivated simulation procedure for combinatorial optimization problems
- Burkard, Rendl
- 1984
(Show Context)
Citation Context ...re available in a software library, QAPLIB. 2 For the results in this paper we use 2 Available from http://www.imm.dtu.dk/˜sk/qaplib/. one such algorithm based on simulated annealing and described in=-= [5]-=-. 3.3. The gain function Once the initial partition has been computed, the multilevel approach uses a modification of the optimisation algorithm (outlined in Section 2.2) successively on each of the c... |

61 | Graph partitioning for high-performance scientific simulations. Sourcebook of parallel computing
- Schloegel, Karypis, et al.
- 2003
(Show Context)
Citation Context ... edges then the cut-weight Φ is given by Φ =|Ec| = � |(v, w)|. (v,w)∈Ec This metric approximates the total communication volume for the sort of homogeneous graphs which represent meshes (althoug=-=h see [12,22] for-=- further discussion). However, it is not appropriate for heterogeneous networks since a cut edge between vertices on ‘neighbouring’ processors does not have the same impact on the runtime of the u... |

60 | M.Cross. Mesh partitioning: a multilevel balancing and refinement algorithm
- Walshaw
(Show Context)
Citation Context ...its additional communication load. 1.6. Related issues We do not address here the issues of inhomogeneous CPU performance. In fact this is a somewhat simpler problem to solve and the software, JOSTLE =-=[25]-=-, in which we have implemented and tested the schemes presented here is able to take this into account using its integral load-balancing capabilities. For example, given a graph of say 75 vertices and... |

57 |
A heuristic for reducing fill-in in sparse matrix factorization
- Bui, Jones
- 1993
(Show Context)
Citation Context ...Cross / Future Generation Computer Systems 17 (2001) 601–623 607 Barnard and Simon [1] as a method of speeding up spectral bisection and improved by both Hendrickson and Leland [13] and Bui and Jone=-=s [2]-=-, who generalised it to encompass local refinement algorithms. Several algorithms for carrying out the matching have been devised by Karypis and Kumar [16], while Walshaw and Cross [25] describe a met... |

55 | Fast and effective algorithms for graph partitioning and sparse matrix ordering
- Gupta
- 1997
(Show Context)
Citation Context ...til the number of vertices in the coarsest graph is smaller than some threshold, the normal practice of the multilevel strategy is to carry out an initial partition. Here, following the idea of Gupta =-=[11]-=-, we contract until the number of vertices in the coarsest graph is the same as the number of subdomains, P , and then simply assign vertex i to subdomain Si. Unlike Gupta, however, we do not carry ou... |

53 |
S.: An efficient heuristic for partitioning graphs
- KERNIGHAN, LIN
- 1970
(Show Context)
Citation Context ...ction followed by repeated expansion/optimisation loops is known as the multilevel paradigm and has been successfully developed as a strategy for overcoming the localised nature of Kernighan–Lin (KL=-=) [17], -=-and other optimisation algorithms. The multilevel idea was first proposed bysC. Walshaw, M. Cross / Future Generation Computer Systems 17 (2001) 601–623 607 Barnard and Simon [1] as a method of spee... |

53 |
Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs
- Pellegrini, Roman
- 1996
(Show Context)
Citation Context ...r cut-weight and for certain experiments the efficiency of the 1D mapping even exceeded that of the 2D [18]. Another more general approach to the mapping problem was developed by Pellegrini and Roman =-=[20,21]-=- and Hendrickson et al. [14,15]. The technique uses recursive bisection of both the mesh (or source graph) and the processor graph (or target graph). This means that the partition of the mesh somehow ... |

49 | Parallel Optimisation Algorithms for Multilevel Mesh Partitioning
- Walshaw, Cross
(Show Context)
Citation Context ... algorithm. In fact we do not believe that this to be a very difficult task given that we have described three parallel optimisation algorithms (for use in the context of a multilevel partitioner) in =-=[24]-=-. Although slightly different in nature to the serial multilevel algorithm outlined in Section 2, all three also rely on the same gain and preference functions and we believe that it should be easy en... |

23 | A hierarchical partition model for adaptive finite element computation
- Teresco, Beall, et al.
(Show Context)
Citation Context ...lthough unlike here as recursive bisection based method) and generalise the idea which (they call skewed graph partitioning) to address other partitioning problems [15]. More recently, Teresco et al. =-=[23] h-=-ave discussed a hierarchical model of network performance within as606 C. Walshaw, M. Cross / Future Generation Computer Systems 17 (2001) 601–623 dynamic load-balancing framework, although they do ... |

19 | Partitioning & mapping of unstructured meshes to parallel machine topologies
- Walshaw, Cross, et al.
- 1995
(Show Context)
Citation Context ...e processor graph, whilst minimising the cost is once again the QAP described in Section 3.2 and uses the algorithm outlined there. This type of two stage approach has been suggested previously (e.g. =-=[26]), b-=-ut since the network costs are not taken into account during the partitioning stage, the subdomains are not ‘shaped’ so as to take into account of the processor topology and the overall combinatio... |

14 | Driessche. Skewed graph partitioning
- Hendrickson, Leland, et al.
- 1997
(Show Context)
Citation Context ...periments the efficiency of the 1D mapping even exceeded that of the 2D [18]. Another more general approach to the mapping problem was developed by Pellegrini and Roman [20,21] and Hendrickson et al. =-=[14,15]-=-. The technique uses recursive bisection of both the mesh (or source graph) and the processor graph (or target graph). This means that the partition of the mesh somehow reflects the natural partition ... |

13 | Enhancing data locality by using terminal propagation
- Hendrickson, Leland, et al.
- 1996
(Show Context)
Citation Context ...periments the efficiency of the 1D mapping even exceeded that of the 2D [18]. Another more general approach to the mapping problem was developed by Pellegrini and Roman [20,21] and Hendrickson et al. =-=[14,15]-=-. The technique uses recursive bisection of both the mesh (or source graph) and the processor graph (or target graph). This means that the partition of the mesh somehow reflects the natural partition ... |

9 | Evaluation of the JOSTLE mesh partitioning code for practical multiphysica applications
- McManus, Walshaw, et al.
- 1996
(Show Context)
Citation Context ...ould be extended to SMP clusters or meta-computers. Perhaps more interestingly, in tests with a solver using the resulting mappings on parallel machine with 2D array type architecture, McManus et al. =-=[19]-=- showed that despite an increase in cut-weight the application scalability and efficiency was much increased using a 2D array mapping as compared to a partitioning/processor assignment approach (see S... |

8 | Experimental analysis of the dual recursive bipartitioning algorithm for static mapping
- Pellegrini, Roman
- 1996
(Show Context)
Citation Context ...r cut-weight and for certain experiments the efficiency of the 1D mapping even exceeded that of the 2D [18]. Another more general approach to the mapping problem was developed by Pellegrini and Roman =-=[20,21]-=- and Hendrickson et al. [14,15]. The technique uses recursive bisection of both the mesh (or source graph) and the processor graph (or target graph). This means that the partition of the mesh somehow ... |

5 | Mesh Partitioning for Distributed Systems: Exploring Optimal Number
- Chen, Taylor
(Show Context)
Citation Context ...ples of network heterogeneities. Even given relatively simple processor graphs such as those shown in Fig. 1, choosing the weighting of links to model the machine is by no means straightforward, e.g. =-=[6]-=-. However, for the example processor graphs shown here we might start by weighting all normal width edges by 1 and the thicker edges by 2. To weight a link between two processors without an explicit e... |

3 | A scalable strategy for the parallelization of multiphysics unstructured mesh-iterative codes on distributed-memory systems
- McManus, Cross, et al.
(Show Context)
Citation Context ...pproach (see Section 4.5). Indeed the same was true even for a 1D array mapping with a far greater cut-weight and for certain experiments the efficiency of the 1D mapping even exceeded that of the 2D =-=[18]-=-. Another more general approach to the mapping problem was developed by Pellegrini and Roman [20,21] and Hendrickson et al. [14,15]. The technique uses recursive bisection of both the mesh (or source ... |

2 | Mapping Large-Scale FEM-Graphs to Highly Parallel Computers with Grid-Like Topology by Self-Organization
- Dormanns, Heiss
- 1994
(Show Context)
Citation Context ...dered it the additional complexity of the problem have led to approaches which are either very limited in application or which focus on particular architectures such as the hypercube. For example, in =-=[8] Dor-=-manns and Heiss describe an approach to map onto grid like networks (e.g. the 1D and 2D arrays that we consider) which uses self-organising maps to geometrically ‘fit’ the processor grid onto the ... |

1 |
The quadratic assignment problem, SFB-Report 126
- Burkard, Cela, et al.
- 1998
(Show Context)
Citation Context ... NCM (see Section 1.4). In fact this expression is a simplification (in the more general case there is also a linear term) of a well-known optimisation problem, the quadratic assignment problem (QAP) =-=[3]-=-. This has been extensively studied since 1957 and is NP-complete [4]. There are many heuristic algorithms which address the problem, some of which are available in a software library, QAPLIB. 2 For t... |

1 | ParaPART: parallel mesh partitioning for distributed systems, in - Chen, Taylor - 1999 |