## Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication (0)

Venue: | IEEE Trans. on Parallel and Distributed Computing |

Citations: | 63 - 34 self |

### BibTeX

@ARTICLE{Catalyurek_hypergraph-partitioningbased,

author = {Umit V. Catalyurek and Cevdet Aykanat},

title = {Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication},

journal = {IEEE Trans. on Parallel and Distributed Computing},

year = {},

volume = {10},

pages = {673--693}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this work, we show that the standard graph-partitioning based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental verification of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices confirm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63% less communication volume (30%--38% less on the average) than the graph model using MeTiS, while PaToH is only 1.3--2.3 times slower than MeTiS on the average. ...

### Citations

1129 |
S.: An efficient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ...y used for graph/hypergraph partitioning because of their short run-times and good quality results. The KL algorithm is an iterative improvement heuristic originally proposed for graph bipartitioning =-=[25]-=-. The KL algorithm, starting from an initial bipartition, performs a number of passes until it finds a locally minimum partition. Each pass consists of a sequence of vertex swaps. The same swap strate... |

874 | V.: A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... is widely used in the representation of computational structures of various scientific applications, including repeated SpMxV computations, to decompose the computational domains for parallelization =-=[5, 6, 20, 21, 27, 28, 31, 36]-=-. In this model, the problem of sparse matrix decomposition for minimizing the communication volume while maintaining the load balance is formulated as the well-known K-way graph partitioning problem.... |

506 |
Introdu to Parallel Computing - Design and Analysis of Algorithms. The Benjamin/Cummings Publishing Company
- Kumar
- 1994
(Show Context)
Citation Context ... is widely used in the representation of computational structures of various scientific applications, including repeated SpMxV computations, to decompose the computational domains for parallelization =-=[5, 6, 20, 21, 27, 28, 31, 36]-=-. In this model, the problem of sparse matrix decomposition for minimizing the communication volume while maintaining the load balance is formulated as the well-known K-way graph partitioning problem.... |

466 |
R.W.: A multi-level algorithm for partitioning graphs
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...uced for the sake of efficient parallelization of a given problem. Hence, heuristics used for decomposition should run in low order polynomial time. Recently, multilevel graph partitioning heuristics =-=[4, 13, 21]-=- are proposed leading to fast and successful graph partitioning tools Chaco [14] and MeTiS [22]. We have exploited the multilevel partitioning methods for the experimental verification of the proposed... |

451 |
A Linear-Time Heuristic for Improving Network Partitions
- Fiduccia, Mattheyses
- 1982
(Show Context)
Citation Context ...minimum partition. Each pass consists of a sequence of vertex swaps. The same swap strategy was applied to the hypergraph bipartitioning problem by Schweikert-Kernighan [38]. Fiduccia-Mattheyses (FM) =-=[10]-=- 11sintroduced a faster implementation of the KL algorithm for hypergraph partitioning. They proposed vertex move concept instead of vertex swap. This modification, as well as proper data structures, ... |

445 |
Combinatorial algorithms for integrated circuit layout
- Lengauer
- 1990
(Show Context)
Citation Context ...N E cj( j; 1): (3) In (3.a), the cutsize is equal to the sum of the costs of the cut nets. In (3.b), each cut net nj contributes cj( j; 1) 7sto the cutsize. Hence, the hypergraph partitioning problem =-=[29]-=- can be defined as the task of dividing a hypergraph into two or more parts such that the cutsize is minimized, while a given balance criterion (1) among the part weights is maintained. Here, part wei... |

393 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...arts such that the cutsize is minimized, while the balance criterion (1) on part weights is maintained. The graph partitioning problem is known to be NP-hard even for bipartitioning unweighted graphs =-=[11]-=-. 2.2 Standard Graph Model for Structurally Symmetric Matrices A structurally symmetric sparse matrix A can be represented as an undirected graphGA =(V�E), where the sparsity pattern of A corresponds ... |

324 | The University of Florida sparse matrix collection - Davis, Hu |

314 | Metis: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Orderings of Sparse Matrices
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...ition should run in low order polynomial time. Recently, multilevel graph partitioning heuristics [4, 13, 21] are proposed leading to fast and successful graph partitioning tools Chaco [14] and MeTiS =-=[22]-=-. We have exploited the multilevel partitioning methods for the experimental verification of the proposed hypergraph models in two approaches. In the first approach, MeTiS graph partitioning tool is u... |

295 | Sparse matrix test problems - DUFF, GRIMES, et al. - 1989 |

201 | Recent directions in netlist partitioning: A survey
- Alpert, Kahng
- 1995
(Show Context)
Citation Context ...s. Random multi-start approach is used in VLSI layout design to alleviate this problem by running the FM algorithm many times starting from random initial partitions to return the best solution found =-=[1]-=-. However, this approach is not viable in parallel computing since decomposition is a preprocessing overhead introduced to increase the efficiency of the underlying parallel algorithm/program. Most us... |

75 |
A heuristic for reducing fill in sparse matrix factorization
- Bui, Jones
- 1993
(Show Context)
Citation Context ...uced for the sake of efficient parallelization of a given problem. Hence, heuristics used for decomposition should run in low order polynomial time. Recently, multilevel graph partitioning heuristics =-=[4, 13, 21]-=- are proposed leading to fast and successful graph partitioning tools Chaco [14] and MeTiS [22]. We have exploited the multilevel partitioning methods for the experimental verification of the proposed... |

54 |
A proper model for the partitioning of electrical circuits
- Schweikert, Kernighan
- 1972
(Show Context)
Citation Context ...asses until it finds a locally minimum partition. Each pass consists of a sequence of vertex swaps. The same swap strategy was applied to the hypergraph bipartitioning problem by Schweikert-Kernighan =-=[38]-=-. Fiduccia-Mattheyses (FM) [10] 11sintroduced a faster implementation of the KL algorithm for hypergraph partitioning. They proposed vertex move concept instead of vertex swap. This modification, as w... |

48 |
hMeTiS A Hypergraph Partitioning Package Version 1.0.1
- Karypis, Kumar, et al.
- 1998
(Show Context)
Citation Context ...PaToH implementation was to investigate the performance of multilevel approach in hypergraph partitioning as described in Section 4.2. Recently released multilevel hypergraph partitioning tool hMeTiS =-=[24]-=- is also used in the second approach. Experimental results presented in Section 5 confirm both the validity of our proposed hypergraph models and the appropriateness of the multilevel approach to hype... |

45 | Graph partitioning and parallel solvers: Has the emperor no clothes? LNCS 1457
- Hendrickson
- 1998
(Show Context)
Citation Context ...ct that the graph models (both standard and proposed ones) do not reflect the actual communication requirement as will be described in Section 2.4. These flaws are also mentioned in a concurrent work =-=[16]-=-. In this work, we propose two computational hypergraph models which avoid all deficiencies of the graph model. The proposed models enable the representation and hence the decomposition of rectangular... |

43 | A hybrid multilevel/genetic approach for circuit partitioning
- Alpert, Kahng
- 1996
(Show Context)
Citation Context ...odels in two approaches. In the first approach, multilevel graph partitioning tool MeTiS is used as a black box by transforming hypergraphs to graphs using the randomized clique-net model proposed in =-=[2]-=-. In the second approach, we have implemented a multilevel hypergraph partitioning tool PaToH, and tested both PaToH and multilevel hypergraph partitioning tool hMeTiS [23, 24] which was released very... |

42 | Mattheyses, "A linear-time heuristic for improving network partitions - Fiduccia, M - 1982 |

40 | Massively parallel methods for engineering and science problems
- Camp, Plimpton, et al.
- 1994
(Show Context)
Citation Context ... is widely used in the representation of computational structures of various scientific applications, including repeated SpMxV computations, to decompose the computational domains for parallelization =-=[5, 6, 20, 21, 27, 28, 31, 36]-=-. In this model, the problem of sparse matrix decomposition for minimizing the communication volume while maintaining the load balance is formulated as the well-known K-way graph partitioning problem.... |

38 | An efficient parallel algorithm for matrix-vector multiplication
- Hendrickson, Leland, et al.
- 1995
(Show Context)
Citation Context ...nd (K;1)m words, and it occurs when each submatrix Ar k (Ac k ) has at least one nonzero in each column (row) in rowwise (columnwise) decomposition. The approach based on 2D checkerboard partitioning =-=[15, 30]-=- reduces the worst-case communication to 2K( p K; p 1) messages and 2( K; 1)m words. In this approach, the worst-case occurs when each row and column of each submatrix has at least one nonzero. 1sThe ... |

35 | An empirical evaluation of the KORBX algorithms for military airlift applications - Carolan, Hill, et al. - 1990 |

31 | Sparse Matrix Computations on Parallel Processor Array
- Ogielski, Aiello
- 1993
(Show Context)
Citation Context ...roblem size. Our goal is to find a rowwise or columnwise partition of A that minimizes the total volume of communication while maintaining the computational load balance. The decomposition heuristics =-=[32, 33, 37]-=- proposed for computational load balancing may result in extensive communication volume, because they do not consider the minimization of the communication volume during the decomposition. In one-dime... |

30 | A new mapping heuristic based on mean field annealing
- Bultan, Aykanat
- 1992
(Show Context)
Citation Context ...ion. These are linear operations on dense vectors and sparse-matrix vector product (SpMxV) of the form y=Ax, where A is an m m square matrix with the same sparsity structure as the coefficient matrix =-=[3, 5, 8, 35]-=-, and y and x are dense vectors. Our goal is the parallelization of the computations in the iterative solvers through rowwise or columnwise decomposition of the A matrix as A = 2 6 4 A r 1 . A r k . A... |

28 | Decomposing irregularly sparse matrices for parallel matrix-vector multiplications
- Çatalyürek, Aykanat
- 1117
(Show Context)
Citation Context ...ion. These are linear operations on dense vectors and sparse-matrix vector product (SpMxV) of the form y=Ax, where A is an m m square matrix with the same sparsity structure as the coefficient matrix =-=[3, 5, 8, 35]-=-, and y and x are dense vectors. Our goal is the parallelization of the computations in the iterative solvers through rowwise or columnwise decomposition of the A matrix as A = 2 6 4 A r 1 . A r k . A... |

27 | Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes
- Aykanat, Ercal
- 1988
(Show Context)
Citation Context ...ion. These are linear operations on dense vectors and sparse-matrix vector product (SpMxV) of the form y=Ax, where A is an m m square matrix with the same sparsity structure as the coefficient matrix =-=[3, 5, 8, 35]-=-, and y and x are dense vectors. Our goal is the parallelization of the computations in the iterative solvers through rowwise or columnwise decomposition of the A matrix as A = 2 6 4 A r 1 . A r k . A... |

25 |
de Geijn. Distributed memory matrix-vector multiplication and conjugate gradient algorithms
- Lewis, van
- 1993
(Show Context)
Citation Context ...nd (K;1)m words, and it occurs when each submatrix Ar k (Ac k ) has at least one nonzero in each column (row) in rowwise (columnwise) decomposition. The approach based on 2D checkerboard partitioning =-=[15, 30]-=- reduces the worst-case communication to 2K( p K; p 1) messages and 2( K; 1)m words. In this approach, the worst-case occurs when each row and column of each submatrix has at least one nonzero. 1sThe ... |

23 | Partitioning of Unstructured Meshes for Load Balancing
- Martin, Otto
- 1995
(Show Context)
Citation Context |

22 |
Modeling hypergraphs by graphs with the same mincut properties
- Ihler, Wagner, et al.
- 1993
(Show Context)
Citation Context ...the contribution of a cut net to the cutsize should always be one in a bipartition. However, the deficiency of the clique-net model is that it is impossible to achieve such a perfect clique-net model =-=[18]-=-. Furthermore, the transformation may result in very large graphs since the number of clique edges induced by the nets increase quadratically with their sizes. Recently, a randomized clique-net model ... |

19 | Parallel Incremental Graph Partitioning
- Ou, Ranka
- 1997
(Show Context)
Citation Context |

18 |
Heuristic improvement technique for bisection of VLSI networks
- Goldberg, Burnstein
- 1983
(Show Context)
Citation Context ...ally on the worst and average decompositions than on just the best decomposition. These considerations have motivated the two–phase application of the move-based algorithms in hypergraph partitioning =-=[12]-=-. In this approach, a clustering is performed on the original hypergraphH0 to induce a coarser hypergraphH1. Clustering corresponds to coalescing highly interacting vertices to supernodes as a preproc... |

18 |
A set of new mapping and coloring heuristics for distributed-memory parallel processors
- Pommerell, Annaratone, et al.
- 1990
(Show Context)
Citation Context |

17 | Partitioning rectangular and structurally nonsymmetric sparse matrices for parallel processing
- Hendrickson, Kolda
(Show Context)
Citation Context ...d Graph Model for Structurally Symmetric/Nonsymmetric Square Matrices The standard graph model is not suitable for the partitioning of nonsymmetric matrices. A recently proposed bipartite graph model =-=[17, 26]-=- enables the partitioning of rectangular as well as structurally symmetric/nonsymmetric square matrices. In this model, each row and column is represented by a vertex, and the sets of vertices represe... |

12 | Partitioning sparse rectangular matrices for parallel processing
- Kolda
(Show Context)
Citation Context ...d Graph Model for Structurally Symmetric/Nonsymmetric Square Matrices The standard graph model is not suitable for the partitioning of nonsymmetric matrices. A recently proposed bipartite graph model =-=[17, 26]-=- enables the partitioning of rectangular as well as structurally symmetric/nonsymmetric square matrices. In this model, each row and column is represented by a vertex, and the sets of vertices represe... |

10 |
Partitioning unstructured computational graphs for nonuniform and adaptive environments
- Kaddoura, Ou, et al.
- 1995
(Show Context)
Citation Context |

9 | Decomposing linear programs for parallel solution
- Pinar, Catalyurek, et al.
- 1995
(Show Context)
Citation Context ...work, we propose two computational hypergraph models which avoid all deficiencies of the graph model. The proposed models enable the representation and hence the decomposition of rectangular matrices =-=[34]-=- as well as symmetric and nonsymmetric square matrices. Furthermore, they introduce an exact representation for the communication volume requirement as described in Section 3.2. The proposed hypergrap... |

9 |
Sparse matrix computations on the cm-5
- Saad, Wu, et al.
- 1993
(Show Context)
Citation Context ...roblem size. Our goal is to find a rowwise or columnwise partition of A that minimizes the total volume of communication while maintaining the computational load balance. The decomposition heuristics =-=[32, 33, 37]-=- proposed for computational load balancing may result in extensive communication volume, because they do not consider the minimization of the communication volume during the decomposition. In one-dime... |

9 | Decomposing irregularly sparse matrices for parallel matrix-vector multiplication - Catalyuerek, Aykanat - 1996 |

7 | Hypergraph partitioning using multilevel approach: application in VLSI domain - Karypis, Aggarwal, et al. - 1997 |

6 | de Geijn, "Distributed Memory Matrix-vector Multiplication and Conjugate Gradient Algorithms - Lewis, van - 1993 |

5 |
Load-Balanced Sparse Matrix-Vector Multiplications on Highly Parallel Computers
- Nastea, Frieder, et al.
- 1997
(Show Context)
Citation Context ...roblem size. Our goal is to find a rowwise or columnwise partition of A that minimizes the total volume of communication while maintaining the computational load balance. The decomposition heuristics =-=[32, 33, 37]-=- proposed for computational load balancing may result in extensive communication volume, because they do not consider the minimization of the communication volume during the decomposition. In one-dime... |

4 |
Mapping Molecular Dynamics Computations on to Hypercubes
- Lakamsani, Bhuyan, et al.
- 1995
(Show Context)
Citation Context |

1 |
The Chaco user's guide, version 2.0, tech. rep
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...ed for decomposition should run in low order polynomial time. Recently, multilevel graph partitioning heuristics [4, 13, 21] are proposed leading to fast and successful graph partitioning tools Chaco =-=[14]-=- and MeTiS [22]. We have exploited the multilevel partitioning methods for the experimental verification of the proposed hypergraph models in two approaches. In the first approach, MeTiS graph partiti... |