## Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies

Citations: | 35 - 22 self |

### BibTeX

@MISC{Ucar_encapsulatingmultiple,

author = {Bora Ucar and Cevdet Aykanat},

title = {Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper addresses the problem of one-dimensional partitioning of structurally unsymmetricsquare and rectangular sparse matrices for parallel matrix-vector and matrix-transposevector multiplies. The objective is to minimize the communication cost while maintaining the balance on computational loads of processors. Most of the existing partitioning models consider only the total message volume hoping that minimizing this communication-cost metric is likely to reduce other metrics. However, the total message latency (start-up time) may be more important than the total message volume. Furthermore, the maximum message volume and latency handled by a single processor are also important metrics. We propose a two-phase approach that encapsulates all these four communication-cost metrics. The objective in the first phase is to minimize the total message volume while maintainingthe computational-load balance. The objective in the second phase is to encapsulate the remaining three communication-cost metrics. We propose communicationhypergraph and partitioning models for the second phase. We then present several methods for partitioning communication hypergraphs. Experiments on a wide range of test matrices show that the proposed approach yields very effective partitioning results. A parallel implementation on a PC cluster verifies that the theoretical improvements shown by partitioning results hold in practice.

### Citations

1518 |
Iterative methods for sparse linear systems
- Saad
- 2003
(Show Context)
Citation Context ... rectangular matrix are the kernel operations in various iterative algorithms. For example, iterative methods such as the conjugate gradient normal equation error and residual methods (CGNE and CGNR) =-=[15, 34]-=- and the standard quasi-minimal residual method (QMR) [14], used for solving unsymmetric linear systems, require computations of the form y = Ax and w = A T z in each iteration, where A is an unsymmet... |

1048 |
An Efficient Heuristic Procedure for Partitioning Graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ... FM, which is an iterative improvement method proposed for graph/hypergraph bipartitioning by Fiduccia and Mattheyses [13] as a faster implementation of the KL algorithm proposed by Kernighan and Lin =-=[27]-=-. In this work, we use the multilevel hypergraph-partitioning tool PaToH [9] for partitioning communication hypergraphs. Recall that the communication-hypergraph partitioning differs from the conventi... |

427 |
Mattheyses, A linear-time heuristic for improving network partitions
- Fiduccia, M
- 1982
(Show Context)
Citation Context ...rser graphs/hypergraphs using various heuristics. A common refinement heuristic is FM, which is an iterative improvement method proposed for graph/hypergraph bipartitioning by Fiduccia and Mattheyses =-=[13]-=- as a faster implementation of the KL algorithm proposed by Kernighan and Lin [27]. In this work, we use the multilevel hypergraph-partitioning tool PaToH [9] for partitioning communication hypergraph... |

419 |
Combinatorial Algorithms for Integrated Circuit Layout
- Lengauer
- 1990
(Show Context)
Citation Context ...problem, the objective is to minimize the cutsize: (2.2) cutsize(Π) = � (λi − 1). ni∈N This objective function is widely used in the VLSI community, and it is referred to as the connectivity–1 metric =-=[28]-=-. The partitioning constraint is to maintain a balance on part weights, i.e., (2.3) Wmax − Wavg Wavg where Wmax is the weight of the part with the maximum weight, Wavg is the average part weight, and ... |

333 | LSQR: An algorithm for sparse linear equations and sparse least squares
- Paige
- 1982
(Show Context)
Citation Context ...d for solving unsymmetric linear systems, require computations of the form y = Ax and w = A T z in each iteration, where A is an unsymmetric square coefficient matrix. The least squares (LSQR) method =-=[31]-=-, used for solving the least squares problem, and the Lanczos method [15], used for computing the singular value decomposition, require frequent computations of the form y=Ax and w=A T z, where A is a... |

279 | Beowulf: A parallel workstation for scientific computation
- Becker, Sterling, et al.
- 1995
(Show Context)
Citation Context ..., we have implemented row-parallel y = Ax and row-column-parallel y = AA T z multiplies using the LAM/MPI 6.5.6 [5] message passing library. The parallel multiply programs were run on a Beowulf class =-=[38]-=- PC cluster with 24 nodes. Each node has a 400Mhz Pentium-II processor and 128MB memory. The interconnection network is comprised of a 3COM SuperStack II 3900 managed switch connected to Intel Etherne... |

274 | MeTiS A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... into the MSN method. 4.1. PaToH-fix: Recursive bipartitioning with fixed vertices. The multilevel paradigm has been successfully used in graph and hypergraph partitioning leading to successful tools =-=[9, 17, 22, 24, 26]-=-. The multilevel heuristics consist of three steps: coarsening, initial partitioning, and uncoarsening. In the first step, a multilevel clustering is applied starting from the original graph/hypergrap... |

213 |
LAM: An open cluster environment for MPI
- Burns, Daoud, et al.
- 1994
(Show Context)
Citation Context ...ained by our methods in the given performance metrics hold in practice. For this purpose, we have implemented row-parallel y = Ax and row-column-parallel y = AA T z multiplies using the LAM/MPI 6.5.6 =-=[5]-=- message passing library. The parallel multiply programs were run on a Beowulf class [38] PC cluster with 24 nodes. Each node has a 400Mhz Pentium-II processor and 128MB memory. The interconnection ne... |

184 | Parallel preconditioning with sparse approximate inverses
- Grote, Huckle
- 1997
(Show Context)
Citation Context ...he respective matrix-vector multiplies (see [21] for such a method). The most notable cases are the preconditioned iterative methods that use an explicit preconditioner such as an approximate inverse =-=[3, 4, 16]-=- M ≈ A −1 . These methods involve matrix-vector multiplies with M and A. The present work can be used in such cases by partitioning matrices independently. However, this approach would suffer from com... |

149 | Multilevel algorithms for multi-constraint graph partitioning
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...d in the second phase are not simple functions of the cut edges or hyperedges or vertex weights defined in the existing graph and hypergraph models even in the multiobjective [37] and multiconstraint =-=[25]-=- frameworks. Besides, these metrics cannot be assessed before a partition is defined. Hence, we anticipate a two-phase approach. Pinar and Hendrickson [33] also adopt a multiphase approach for handlin... |

134 |
A transpose-free quasi-minimal residual algorithm for nonHermitian linear systems
- Freund, Nachtigal
- 1991
(Show Context)
Citation Context ...tive algorithms. For example, iterative methods such as the conjugate gradient normal equation error and residual methods (CGNE and CGNR) [15, 34] and the standard quasi-minimal residual method (QMR) =-=[14]-=-, used for solving unsymmetric linear systems, require computations of the form y = Ax and w = A T z in each iteration, where A is an unsymmetric square coefficient matrix. The least squares (LSQR) me... |

117 |
Multi-Way Network Partitioning
- Sanchis
- 1989
(Show Context)
Citation Context ...be notable with PaToH-fix. In order to alleviate this problem, we have developed a multilevel direct K-way hypergraph partitioner (MSN) by integrating Sanchis’s direct K-way refinement (SN) algorithm =-=[35]-=- to the uncoarsening step of the multilevel framework. The coarsening step of MSN is essentially the same as that of PaToH. In the initial partitioning step, a K-way partition on the coarsest hypergra... |

101 | Preconditioning techniques for large linear systems: a survey
- Benzi
(Show Context)
Citation Context ...he respective matrix-vector multiplies (see [21] for such a method). The most notable cases are the preconditioned iterative methods that use an explicit preconditioner such as an approximate inverse =-=[3, 4, 16]-=- M ≈ A −1 . These methods involve matrix-vector multiplies with M and A. The present work can be used in such cases by partitioning matrices independently. However, this approach would suffer from com... |

73 | Graph partitioning models for parallel computing
- Hendrickson, Kolda
(Show Context)
Citation Context ... RECTANGULAR MATRICES 1839 its partitioning objective. Several recently proposed alternative partitioning models for parallel computing were discussed in the excellent survey by Hendrickson and Kolda =-=[20]-=-. As noted in the survey, most of the partitioning models mainly consider minimizing the total message volume. However, the communication overhead is a function of the message latency (start-up time) ... |

61 | Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication
- Çatalyürek, Aykanat
- 1999
(Show Context)
Citation Context ...eme, first a two-way partition is obtained, and then this two-way partition is further bipartitioned recursively. The connectivity−1 cutsize metric (see (2.2)) is easily handled through net splitting =-=[8]-=- during recursive bisection steps. Although the recursive-bisection paradigm is successful in K-way partitioning in general, its performance degrades for hypergraphs with large net sizes. Since commun... |

57 | Permuting sparse rectangular matrices into block-diagonal form
- Aykanat, Pınar, et al.
(Show Context)
Citation Context ...a limitation for iterative solvers on unsymmetric square or rectangular matrices when the x-space and y-space vectors do not undergo linear vector operations. Recently, Aykanat, Pinar, and Çatalyürek =-=[2]-=-, Çatalyürek and Aykanat [7, 8], and Pinar, Çatalyürek, Aykanat, and Pinar [32] proposed hypergraph models for partitioning unsymmetric square and rectangular matrices with the flexibility of producin... |

55 | A comparative study of sparse approximate inverse preconditioners
- BENZI, ˚UMA
- 1999
(Show Context)
Citation Context ...he respective matrix-vector multiplies (see [21] for such a method). The most notable cases are the preconditioned iterative methods that use an explicit preconditioner such as an approximate inverse =-=[3, 4, 16]-=- M ≈ A −1 . These methods involve matrix-vector multiplies with M and A. The present work can be used in such cases by partitioning matrices independently. However, this approach would suffer from com... |

51 | Message-Passing Performance of Various Computers. University of Tennessee and Knoxville,Tech report 95-299
- Dongarra, Dunigan
- 1995
(Show Context)
Citation Context ...ell as the message volume. Depending on the machine architecture and problem size, the communication overhead due to the message latency may be much higher than the overhead due to the message volume =-=[12]-=-. None of the works listed in the survey address minimizing the total message latency. Furthermore, the maximum message volume and latency handled by a single processor are also crucial cost metrics t... |

46 |
PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0
- Çatalyürek, Aykanat
- 1999
(Show Context)
Citation Context ...rtual column stripe. In Figure 2.1(a), P2, holding x-vector block x2=x[8:14], sends vector ˆx 3 2 = x[12: 14] to P3 because of nonzero columns 12,13, and 14 in A32. P3 needs those entries to compute y=-=[9]-=-, y[10], and y[12]. Similarly, P2 sends ˆx 4 2 = x[12] to P4 because of the nonzero column 12 in A42. Hence, the number of messages sent by P2 is 2 with a total volume of four words. Note that P2 effe... |

45 |
hMeTiS: A Hypergraph Partitioning Package, Version 1.5.3
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... 12, 19, 25, and 26 corresponding to the nonzero coupling columns in the fourth row stripe of ABL. These nonzeros summarize the need of processor P4 for xC-vector entries x[7],x[12],x[19],x[25], and x=-=[26]-=- in row-parallel y=Ax. Here, we exploit the row-net hypergraph model for sparse matrix representation [7, 8] to construct a communication hypergraph from matrix C. In this model, communication matrix ... |

41 | Graph partitioning and parallel solvers: Has the emperor no clothes? In IRREGULAR’98: solving irregularly structured problems
- Hendrickson
- 1998
(Show Context)
Citation Context ...objective correspond to, respectively, maintaining the computational-load balance and minimizing the total message volume. In recent works, Çatalyürek [6], Çatalyürek and Aykanat [7], and Hendrickson =-=[19]-=- mentioned the limitations of this standard approach. First, it tries to minimize a wrong objective function since the edge-cut metric does not model the actual communication volume. Second, it can on... |

32 | A new algorithm for multi-objective graph partitioning
- Schloegel, Karypis, et al.
- 1999
(Show Context)
Citation Context ...ble. The metrics minimized in the second phase are not simple functions of the cut edges or hyperedges or vertex weights defined in the existing graph and hypergraph models even in the multiobjective =-=[37]-=- and multiconstraint [25] frameworks. Besides, these metrics cannot be assessed before a partition is defined. Hence, we anticipate a two-phase approach. Pinar and Hendrickson [33] also adopt a multip... |

28 | Decomposing irregularly sparse matrices for parallel matrix-vector multiplication
- CATALYUREK, AYKANAT
- 1996
(Show Context)
Citation Context ...oning constraint and objective correspond to, respectively, maintaining the computational-load balance and minimizing the total message volume. In recent works, Çatalyürek [6], Çatalyürek and Aykanat =-=[7]-=-, and Hendrickson [19] mentioned the limitations of this standard approach. First, it tries to minimize a wrong objective function since the edge-cut metric does not model the actual communication vol... |

23 | Two novel multiway circuit partitioning algorithms using relaxed locking
- Dasdan, Aykanat
- 1997
(Show Context)
Citation Context ...net which connects that part, and it is maintained later in the uncoarsening step. In the uncoarsening step, the SN algorithm, which is a generalization of the two-way FM paradigm to K-way refinement =-=[11, 36]-=-, is used. SN, starting from a K-way initial partition, performs a number of passes until it finds a locally optimum partition, where each pass consists of a sequence of vertex moves. The fundamental ... |

23 |
Multiple-way network partitioning with different cost functions
- Sanchis
- 1993
(Show Context)
Citation Context ...net which connects that part, and it is maintained later in the uncoarsening step. In the uncoarsening step, the SN algorithm, which is a generalization of the two-way FM paradigm to K-way refinement =-=[11, 36]-=-, is used. SN, starting from a K-way initial partition, performs a number of passes until it finds a locally optimum partition, where each pass consists of a sequence of vertex moves. The fundamental ... |

22 | A hypergraph-partitioning approach for coarse-grain decomposition
- Çatalyurek, Aykanat
(Show Context)
Citation Context ...e x-vector entries are the input, and the y-vector entries are the output of the reduction. The matrix A corresponds to the mapping from the input to the output vector entries. Çatalyürek and Aykanat =-=[10]-=- briefly list several practical problems that involve this correspondence. Hence, the proposed two-phase approach can also be used in reducing the communication overhead in such practical reduction pr... |

18 | Hypergraph models for sparse matrix partitioning and reordering
- Çatalyürek
- 1999
(Show Context)
Citation Context ...ts is minimized. The partitioning constraint and objective correspond to, respectively, maintaining the computational-load balance and minimizing the total message volume. In recent works, Çatalyürek =-=[6]-=-, Çatalyürek and Aykanat [7], and Hendrickson [19] mentioned the limitations of this standard approach. First, it tries to minimize a wrong objective function since the edge-cut metric does not model ... |

14 | Partitioning for complex objectives
- Pinar, Hendrickson
- 2001
(Show Context)
Citation Context ... the multiobjective [37] and multiconstraint [25] frameworks. Besides, these metrics cannot be assessed before a partition is defined. Hence, we anticipate a two-phase approach. Pinar and Hendrickson =-=[33]-=- also adopt a multiphase approach for handling complex partitioning objectives. Here, we focus on the second phase and do not go back and forth between the phases. Therefore, our contribution can be s... |

13 | Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing
- Hendrickson, Kolda
(Show Context)
Citation Context ...res simultaneous partitioning of the participating matrices in a method that considers the complicated interaction among the efficient parallelizations of the respective matrix-vector multiplies (see =-=[21]-=- for such a method). The most notable cases are the preconditioned iterative methods that use an explicit preconditioner such as an approximate inverse [3, 4, 16] M ≈ A −1 . These methods involve matr... |

9 | Decomposing linear programs for parallel solution
- Pınar, Çatalyürek, et al.
- 1996
(Show Context)
Citation Context ...es when the x-space and y-space vectors do not undergo linear vector operations. Recently, Aykanat, Pinar, and Çatalyürek [2], Çatalyürek and Aykanat [7, 8], and Pinar, Çatalyürek, Aykanat, and Pinar =-=[32]-=- proposed hypergraph models for partitioning unsymmetric square and rectangular matrices with the flexibility of producing unsymmetric partitions on the input and output vectors. Hendrickson and Kolda... |

8 |
Watson graph partitioning package
- Gupta
- 1996
(Show Context)
Citation Context ... into the MSN method. 4.1. PaToH-fix: Recursive bipartitioning with fixed vertices. The multilevel paradigm has been successfully used in graph and hypergraph partitioning leading to successful tools =-=[9, 17, 22, 24, 26]-=-. The multilevel heuristics consist of three steps: coarsening, initial partitioning, and uncoarsening. In the first step, a multilevel clustering is applied starting from the original graph/hypergrap... |

7 | Hypergraph Partitioning with Fixed Vertices
- Alpert, Caldwell, et al.
- 2000
(Show Context)
Citation Context ...of the proposed model may be overcome by enforcing the consistency condition through exploiting the partitioning with fixed vertices feature, which exists in some of the hypergraph-partitioning tools =-=[1, 9]-=-. We discuss such a method in section 4.1.sPARTITIONING SPARSE RECTANGULAR MATRICES 1847 Partitioning xC-vector entries affects the message-volume requirement determined in the first phase. The messag... |

7 | Description and use of animal breeding data for large least squares problems
- Hegland
- 1993
(Show Context)
Citation Context ...218 3259 7858 5035 13397 world 34506 32734 164470 5116 10405 9569 20570 13610 30881 Sparse Matrix Collection, 1 are from the unsymmetric linear system application. The pig-large and pig-very matrices =-=[18]-=- are from the least squares problem. The remaining six matrices, which are obtained from Hungarian Academy of Sciences OR Lab, 2 are from miscellaneous and stochastic linear programming problems. In t... |

5 |
New iterative methods for linear inequalities
- Yang, Murty
- 1992
(Show Context)
Citation Context ... diagonal matrix. Rather than forming the coefficient matrix AD 2 A T , which may be quite dense, the above computation is performed as w = A T z, x=D 2 w, and y = Ax. The surrogate constraint method =-=[29, 30, 39, 40]-=-, which is used for solving the linear feasibility problem, requires decoupled matrix-vector and matrix-transpose-vector multiplies involving the same rectangular matrix. ∗Received by the editors July... |

2 |
The parallel surrogate constraint approach to the linear feasibility problem
- Özakta¸s, Pınar, et al.
- 1996
(Show Context)
Citation Context ... diagonal matrix. Rather than forming the coefficient matrix AD 2 A T , which may be quite dense, the above computation is performed as w = A T z, x=D 2 w, and y = Ax. The surrogate constraint method =-=[29, 30, 39, 40]-=-, which is used for solving the linear feasibility problem, requires decoupled matrix-vector and matrix-transpose-vector multiplies involving the same rectangular matrix. ∗Received by the editors July... |

2 | C ataly urek, Permuting sparse rectangular matrices into block-diagonal form - Aykanat, Pinar, et al. - 2003 |

2 | ataly urek and C. Aykanat, Decomposing irregularly sparse matrices for parallel matrix-vector multiplications - C - 1996 |

2 | ataly urek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication - C - 1999 |

1 |
The Chaco Users’s Guide, Version 2.0
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ... into the MSN method. 4.1. PaToH-fix: Recursive bipartitioning with fixed vertices. The multilevel paradigm has been successfully used in graph and hypergraph partitioning leading to successful tools =-=[9, 17, 22, 24, 26]-=-. The multilevel heuristics consist of three steps: coarsening, initial partitioning, and uncoarsening. In the first step, a multilevel clustering is applied starting from the original graph/hypergrap... |

1 |
Özaktas¸, Algorithms for Linear and Convex Feasibility Problems: A Brief Study
- ester, UK
- 1990
(Show Context)
Citation Context ... diagonal matrix. Rather than forming the coefficient matrix AD 2 A T , which may be quite dense, the above computation is performed as w = A T z, x=D 2 w, and y = Ax. The surrogate constraint method =-=[29, 30, 39, 40]-=-, which is used for solving the linear feasibility problem, requires decoupled matrix-vector and matrix-transpose-vector multiplies involving the same rectangular matrix. ∗Received by the editors July... |

1 |
Parallel Algorithms for the Solution of Large Sparse Inequality Systems
- Turna
- 1998
(Show Context)
Citation Context |

1 | ataly urek, Hypergraph models for sparse matrix partitioning and reordering - C - 1999 |

1 | ataly urek and C. Aykanat, PaToH: A multilevel hypergraph partitioning tool, version 3.0 - C - 1999 |

1 | Watson graph partitioning package, Tech - Gupta - 1996 |