## Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing

Venue: | SIAM J. Sci. Comput |

Citations: | 13 - 0 self |

### BibTeX

@ARTICLE{Hendrickson_partitioningrectangular,

author = {Bruce Hendrickson and Tamara and G. Kolda},

title = {Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing},

journal = {SIAM J. Sci. Comput},

year = {},

volume = {21},

pages = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. A common operation in scientific computingis the multiplication of a sparse, rectangular, or structurally unsymmetric matrix and a vector. In many applications the matrix-transposevector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioningbipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.

### Citations

10994 |
Computers and Intractability: A Guide to the Theory of NP-completeness
- Garey, S
- 1979
(Show Context)
Citation Context ...ved out of their preferred set. The question is how to move the minimum amount of gain value while aggregating su#cient weight. This is equivalent to the knapsack problem which is known to be NP-hard =-=[18]-=-. Let |E | denote the number of edges in G or correspondingly the number of nonzeros in A, and let |R| and |C| denote the number of row and column vertices. An iteration PARTITIONING SPARSE RECTANGULA... |

8581 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ...for all vertices requires an addition or subtraction for each edge, at a cost of O(|E |). Finding the weighted median of a set of k values requires O(k) operations (see, for instance, problem 10.2 in =-=[9]-=-), and it is used on a set of |R| gains and then a set of |C| gains. Our implementation actually uses a simpler, binary search algorithm for median finding. Although it works well in practice, it is n... |

1053 |
An efficient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ...tition sizes correspond to work load per processor. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral [39], Kernighan--Lin =-=[31]-=-/Fiduccia-- Mattheyses [12], and multilevel [6, 26, 29, 30] methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, an... |

797 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
(Show Context)
Citation Context ...y. If the matrix is square and structurally symmetric, the problem can be expressed in terms of graph partitioning, and a number of good algorithms and software tools have been developed for this use =-=[24, 29, 45]-=-. These methods can be used for partitioning a square, structurally # Received by the editors July 6, 1998; accepted for publication (in revised form) April 13, 1999; published electronically May 17, ... |

537 | Using linear algebra for intelligent information retrieval
- Berry, Dumas, et al.
- 1995
(Show Context)
Citation Context ...puting the truncated SVD of a large sparse matrix A via a Lanczos procedure requires frequent multiplies by A and A T . This arises in, for example, latent semantic indexing for information retrieval =-=[4]-=-, clustering for hypertext matrices [5], and geophysical applications [43]. Permuting A does not change its singular values, and the singular vectors of the original matrix are just permutations of th... |

490 |
Partitioning sparse matrices with eigenvectors of graphs
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ...onstraints on the partition sizes correspond to work load per processor. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral =-=[39]-=-, Kernighan--Lin [31]/Fiduccia-- Mattheyses [12], and multilevel [6, 26, 29, 30] methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by B... |

445 |
A multilevel algorithm for partitioning graphs’, „echnical Report SAND93-1301, Sandia National Laboratories
- Hendrickson, Leland
(Show Context)
Citation Context ...or. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral [39], Kernighan--Lin [31]/Fiduccia-- Mattheyses [12], and multilevel =-=[6, 26, 29, 30]-=- methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, and Raghavan [5]. Further, the alternating partitioning metho... |

413 |
Algebraic connectivity of graphs
- Fiedler
- 1973
(Show Context)
Citation Context ...D - A is computed, where D = diag{d 1 , d 2 , . . . , dm+n } and d i = # jsa ij . The matrix L is symmetric and positive semidefinite. Furthermore, we have the following theorem. Theorem 5.3 (Fiedler =-=[13]-=-). If the graph of A is connected, then the multiplicity of the zero eigenvalue is one. Let w denote a Fiedler vector of L, that is, an eigenvector corresponding to the smallest positive eigenvalue of... |

377 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...itioning is to try to minimize these cross edges, while maintaining some balance on the number of rows (or the number of nonzeros) in the two sets. This graph bisection problem is known to be NP-hard =-=[17]-=-. This approach is not well suited to rectangular or structurally unsymmetric matrix partitioning. If the matrix is rectangular, then the graph model does not apply. If the matrix is square, the stand... |

339 | LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Square Problems
- Paige, Saunders
- 1982
(Show Context)
Citation Context ...ducts, and in this case, the matrices are rectangular. Consider a system of the form min #Ax - b# 2 , where A is an m n matrix with m > n. This problem can be solved by iterative methods such as LSQR =-=[38]-=- that require computations of the form Ar and A T s every iteration. Using the permuted matrix does not change the minimal value of the least squares objective function. Another situation in which A i... |

338 | QMR: A quasi-minimal residual method for nonHermitian linear systems
- Freund, Nachtigal
- 1991
(Show Context)
Citation Context ...numerical methods. One very important example is the solution of a unsymmetric system Ax = b, with an iterative method such as biconjugate gradient (BiCG) [14] or standard quasiminimal residual (QMR) =-=[16]-=-. During each iteration, these methods require the computation of Ar and A T s for some vectors r and s. (It is worth noting that there are solvers for unsymmetric systems that do not require the A T ... |

305 |
A Linear Time Heuristic for Improving Network Partitions
- Fiduccia, Mattheyses
- 1982
(Show Context)
Citation Context ...work load per processor. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral [39], Kernighan--Lin [31]/Fiduccia-- Mattheyses =-=[12]-=-, and multilevel [6, 26, 29, 30] methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, and Raghavan [5]. Further, th... |

292 |
Partitioning of unstructured problems for parallel processing
- Simon
- 1991
(Show Context)
Citation Context ...ntly reduce execution times [26]. 5.3. Spectral. A popular algorithm for standard graph partitioning is spectralsbisection, which uses an eigenvector of the Laplacian matrix associated with the graph =-=[25, 39, 41]-=-. We can apply spectral partitioning to a rectangular or structurally unsymmetric problem by first symmetrizing it. Given a bipartite graph G = (R, C, E) of a matrix A, form the corresponding structur... |

278 | A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems
- Barnard, Simon
- 1993
(Show Context)
Citation Context ...ialized with a natural partition and AP with a random partition, so all further runs were performed in this way. The spectral method uses the multilevel Rayleigh quotient iteration/Symmlq eigensolver =-=[1]-=- from the Chaco partitioning software [24]. The multilevel (ML) algorithms divide the coarsest graph randomly and use various PARTITIONING SPARSE RECTANGULAR MATRICES 2063 Table 6.1 Partitioning metho... |

190 |
The chaco user’s guide – version 2.0
- Hendrickson, Leland
- 1994
(Show Context)
Citation Context ...y. If the matrix is square and structurally symmetric, the problem can be expressed in terms of graph partitioning, and a number of good algorithms and software tools have been developed for this use =-=[24, 29, 45]-=-. These methods can be used for partitioning a square, structurally # Received by the editors July 6, 1998; accepted for publication (in revised form) April 13, 1999; published electronically May 17, ... |

185 | Parallel preconditioning with sparse approximate inverses
- Grote, Huckle
- 1997
(Show Context)
Citation Context ... removed.) The matrix is of size 17,758 with 99,147 nonzeros. We used research code provided by Benzi and Tuma [2] to generate an approximate inverse preconditioner via the method of Grote and Huckle =-=[20]-=-. The resulting preconditioner had 76,372 nonzeros. The two matrices were combined into a bipartite graph with weighted edges and vertices as described in section 4. The memplus matrix will be partiti... |

179 |
Conjugate gradient methods for indefinite systems
- Fletcher
- 1976
(Show Context)
Citation Context ...ymmetric matrices occur in a wide variety of numerical methods. One very important example is the solution of a unsymmetric system Ax = b, with an iterative method such as biconjugate gradient (BiCG) =-=[14]-=- or standard quasiminimal residual (QMR) [16]. During each iteration, these methods require the computation of Ar and A T s for some vectors r and s. (It is worth noting that there are solvers for uns... |

179 |
ªAn Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations,º
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...ntly reduce execution times [26]. 5.3. Spectral. A popular algorithm for standard graph partitioning is spectralsbisection, which uses an eigenvector of the Laplacian matrix associated with the graph =-=[25, 39, 41]-=-. We can apply spectral partitioning to a rectangular or structurally unsymmetric problem by first symmetrizing it. Given a bipartite graph G = (R, C, E) of a matrix A, form the corresponding structur... |

134 |
A transpose-free quasi-minimal residual algorithm for nonHermitian linear systems
- Freund, Nachtigal
- 1991
(Show Context)
Citation Context ...ods require the computation of Ar and A T s for some vectors r and s. (It is worth noting that there are solvers for unsymmetric systems that do not require the A T s product, e.g., transposefree QMR =-=[15]-=-). To use the partitioned matrix, PAQ, we can solve (PAQ)y = P b, where Q T x = y. Note that permuting the rows and columns of a matrix changes its eigenvalues; however, because we do not know the exa... |

89 | Analysis of multilevel graph partitioning
- Karypis, Kumar
- 1995
(Show Context)
Citation Context ...or. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral [39], Kernighan--Lin [31]/Fiduccia-- Mattheyses [12], and multilevel =-=[6, 26, 29, 30]-=- methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, and Raghavan [5]. Further, the alternating partitioning metho... |

76 | D.P.: A semidiscrete matrix decomposition for latent semantic indexing information retrieval
- Kolda, O’Leary
- 1998
(Show Context)
Citation Context ...om the semidiscrete decomposition that was introduced by O'Leary and Peleg [37] for image compression and that was also used for latent semantic indexing in information retrieval by Kolda and O'Leary =-=[32, 35, 34]-=-. 5.2. Kernighan--Lin/Fiduccia--Mattheyses. The Kernighan--Lin [31] algorithm is a widely used method for improving a graph partition. As with alternating partitioning, the initial partition can be ra... |

68 |
A heuristic for reducing fill in sparse matrix factorization
- Bui, Jones
- 1993
(Show Context)
Citation Context ...or. In section 5, several algorithms for partitioning the bipartite graphs are presented. Modifications of the well-known spectral [39], Kernighan--Lin [31]/Fiduccia-- Mattheyses [12], and multilevel =-=[6, 26, 29, 30]-=- methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, and Raghavan [5]. Further, the alternating partitioning metho... |

37 | An efficient parallel algorithm for matrix-vector multiplication
- Hendrickson, Leland, et al.
- 1995
(Show Context)
Citation Context ... for sparse matrices. For dense matrices or sparse matrices with nonzero patterns that are di#cult to exploit, two-dimensional decompositions are typically used; see Hendrickson, Leland, and Plimpton =-=[27]-=- or Lewis and van de Geijn [36]. 2048 PARTITIONING SPARSE RECTANGULAR MATRICES 2049 Fig. 1.1. Matrix before and after partitioning. unsymmetric matrix A by considering the sparsity pattern of the A + ... |

33 |
Digital image compression by outer product expansion
- O’Leary, Peleg
- 1983
(Show Context)
Citation Context ... guaranteed finite [33]. Alternatively, a maximum allowable number of iterations can be specified. This method was derived from the semidiscrete decomposition that was introduced by O'Leary and Peleg =-=[37]-=- for image compression and that was also used for latent semantic indexing in information retrieval by Kolda and O'Leary [32, 35, 34]. 5.2. Kernighan--Lin/Fiduccia--Mattheyses. The Kernighan--Lin [31]... |

30 | Limited-memory matrix methods with applications. Dissertation, Applied Mathematics
- Kolda
- 1997
(Show Context)
Citation Context ...om the semidiscrete decomposition that was introduced by O'Leary and Peleg [37] for image compression and that was also used for latent semantic indexing in information retrieval by Kolda and O'Leary =-=[32, 35, 34]-=-. 5.2. Kernighan--Lin/Fiduccia--Mattheyses. The Kernighan--Lin [31] algorithm is a widely used method for improving a graph partition. As with alternating partitioning, the initial partition can be ra... |

25 | Partitioning mathematical programs for parallel solution
- Ferris, Horn
- 1998
(Show Context)
Citation Context ... or alternating partitioning combined with Fiduccia--Mattheyses refinement. Several authors have used partitioned bipartite graphs of matrices for di#erent parallelization objectives. Ferris and Horn =-=[11]-=- find vertex separators in the bipartite graph to reorder the matrix into arrowhead form. This is useful in the parallelization 2050 BRUCE HENDRICKSON AND TAMARA G. KOLDA of linear programs and other ... |

24 |
Sparse matrix reordering schemes for browsing hypertext
- BERRY, HENDRICKSON, et al.
- 1996
(Show Context)
Citation Context ... Mattheyses [12], and multilevel [6, 26, 29, 30] methods are given for the bipartite graph model. The modification of the spectral method was previously introduced by Berry, Hendrickson, and Raghavan =-=[5]-=-. Further, the alternating partitioning method of Kolda [33] is presented; this method is specific to the bipartite case. Finally in section 6, we measure the performance of various methods for partit... |

24 | Parallel Sparse Matrix-Vector Multiply Software for Matrices with Data Locality. Concurrency: Practice and Experience
- Tuminaro, Shadid, et al.
- 1998
(Show Context)
Citation Context ... been chosen in such a way that the number of nonzeros per block row is nearly equal. For now we assume nothing about the n j 's. The algorithm we describe for computing Ax is widely used; see, e.g., =-=[42]-=-. Analogous algorithms exist for a column-based partitioning. Specifically, if we have a matrix that is partitioned into block columns, we can simply work with the transpose of the matrix that is part... |

23 |
de Geijn. Distributed memory matrix-vector multiplication and conjugate gradient algorithms
- Lewis, van
- 1993
(Show Context)
Citation Context ... matrices or sparse matrices with nonzero patterns that are di#cult to exploit, two-dimensional decompositions are typically used; see Hendrickson, Leland, and Plimpton [27] or Lewis and van de Geijn =-=[36]-=-. 2048 PARTITIONING SPARSE RECTANGULAR MATRICES 2049 Fig. 1.1. Matrix before and after partitioning. unsymmetric matrix A by considering the sparsity pattern of the A + A T matrix. But this trick is a... |

19 |
Analytis. private communication
- T
- 1997
(Show Context)
Citation Context ...s available from MatrixMarket. 4 (It contained 27,003 explicitly stored zeros, which were removed.) The matrix is of size 17,758 with 99,147 nonzeros. We used research code provided by Benzi and Tuma =-=[2]-=- to generate an approximate inverse preconditioner via the method of Grote and Huckle [20]. The resulting preconditioner had 76,372 nonzeros. The two matrices were combined into a bipartite graph with... |

19 |
Mesh partitioning and load-balancing for distributed memory parallel systems
- Walshaw, Cross, et al.
(Show Context)
Citation Context ...y. If the matrix is square and structurally symmetric, the problem can be expressed in terms of graph partitioning, and a number of good algorithms and software tools have been developed for this use =-=[24, 29, 45]-=-. These methods can be used for partitioning a square, structurally # Received by the editors July 6, 1998; accepted for publication (in revised form) April 13, 1999; published electronically May 17, ... |

15 | Adaptive use of iterative methods in interior point methods for linear programming
- Wang, O’Leary
- 1995
(Show Context)
Citation Context ... #y # = # w v # , (2.1) where y is the dual variable and D is a diagonal matrix that changes each iteration. Alternatively, we may solve the normal equations, (AD -2 A T )#y = r. See Wang and O'Leary =-=[46]-=- for an algorithm that solves these equations iteratively as well as an overview of other such methods. When iterative solvers are employed, frequent multiplications involving A and A T are needed. Ev... |

11 | Partitioning sparse rectangular matrices for parallel processing
- Kolda
- 1457
(Show Context)
Citation Context ...ing matrix vector products. Previous attempts to address the general matrix partitioning problem for matrix vector multiplication include the work of Kolda [33] and an earlier report on this research =-=[23]-=-. The authors have recently become aware of a closely related work by C atalyurek and Aykanat [7]. In their approach, the structure of the unsymmetric matrix is represented by a hypergraph in which ro... |

8 |
Generalized block-tridiagonal matrix orderings for parallel computation in process flowsheeting
- Coon, Stadtherr
- 1995
(Show Context)
Citation Context ...raph to reorder the matrix into arrowhead form. This is useful in the parallelization 2050 BRUCE HENDRICKSON AND TAMARA G. KOLDA of linear programs and other optimization problems. Coon and Stadtherr =-=[8]-=- use a bipartite graph partitioning model to identify parallelism in sparse Gaussian elimination. Neither of these e#orts addresses the problem of parallelizing matrix vector products. Previous attemp... |

8 | Latent semantic indexing via a semi-discrete matrix decomposition
- Kolda, O'Leary
- 1999
(Show Context)
Citation Context ...om the semidiscrete decomposition that was introduced by O'Leary and Peleg [37] for image compression and that was also used for latent semantic indexing in information retrieval by Kolda and O'Leary =-=[32, 35, 34]-=-. 5.2. Kernighan--Lin/Fiduccia--Mattheyses. The Kernighan--Lin [31] algorithm is a widely used method for improving a graph partition. As with alternating partitioning, the initial partition can be ra... |

7 | Description and use of animal breeding data for large least squares problems
- Hegland
- 1993
(Show Context)
Citation Context ...ke a factor of 2, except for the Amatrix which has 30,000 nonzeros per processor. 6.1. Least squares. The pig-large and pig-very matrices are from least squares problems relating to pig breeding data =-=[21, 28]-=- and were obtained from Du# [10]. The pig-large matrix is of size 28,254 17,264 with 75,018 nonzeros. The results of row-based partitioning the pig-large matrix over eight processors are given in Tabl... |

5 |
Schatzung von Zuchtwerten feldgeprufter Schweine mit einem Mehrmerkmals-Tiermodell
- Hofer
- 1990
(Show Context)
Citation Context ...ke a factor of 2, except for the Amatrix which has 30,000 nonzeros per processor. 6.1. Least squares. The pig-large and pig-very matrices are from least squares problems relating to pig breeding data =-=[21, 28]-=- and were obtained from Du# [10]. The pig-large matrix is of size 28,254 17,264 with 75,018 nonzeros. The results of row-based partitioning the pig-large matrix over eight processors are given in Tabl... |

5 |
Global Earth Structure: inference and assessment
- Vasco, Johnson, et al.
- 1999
(Show Context)
Citation Context ...re requires frequent multiplies by A and A T . This arises in, for example, latent semantic indexing for information retrieval [4], clustering for hypertext matrices [5], and geophysical applications =-=[43]-=-. Permuting A does not change its singular values, and the singular vectors of the original matrix are just permutations of those for the permuted matrix. 2052 BRUCE HENDRICKSON AND TAMARA G. KOLDA 3.... |

3 |
Private Communication
- Rothberg
- 1998
(Show Context)
Citation Context ... the FM and ML methods although this was not enforced by any constraint. Again we can observe that edge cuts correspond to total message volume. The 123,221 141,344 Amatrix was obtained from Rothberg =-=[40]-=-. This matrix has 1,437,692 nonzeros and contains 72 dense rows. Tables 6.9 and 6.10 contain the results of a column-based partitioning of this matrix over 128 processors. This is an interesting parti... |

2 |
C ataly urek and C. Aykanat, Decomposing linear programs for parallel solution, in Parallel Algorithms for Irregularly Structured
- V
- 1996
(Show Context)
Citation Context ...or matrix vector multiplication include the work of Kolda [33] and an earlier report on this research [23]. The authors have recently become aware of a closely related work by C atalyurek and Aykanat =-=[7]-=-. In their approach, the structure of the unsymmetric matrix is represented by a hypergraph in which rows are vertices and columns are hyperedges. A distinct advantage of their approach over ours is t... |

2 |
Private communication
- Duff
- 1998
(Show Context)
Citation Context ...trix which has 30,000 nonzeros per processor. 6.1. Least squares. The pig-large and pig-very matrices are from least squares problems relating to pig breeding data [21, 28] and were obtained from Du# =-=[10]-=-. The pig-large matrix is of size 28,254 17,264 with 75,018 nonzeros. The results of row-based partitioning the pig-large matrix over eight processors are given in Tables 6.3 and 6.4. The natural part... |

2 |
Graph partitioning and parallel solves: Has the emporer no clothes
- Hendrickson
- 1998
(Show Context)
Citation Context ...cuss the appropriateness of this approximation further in section 7. It is worth noting that the same approximation is used (although not widely acknowledged) in the standard graph partitioning model =-=[22]-=-. By not constraining the partition of the columns, we allow for whatever partition leads to the minimal number of edge cuts. Other possible objectives are discussed in section 3.3. One alternative is... |

2 |
Decomposing linear programs for parallel solution
- nar, Catalyurek, et al.
- 1996
(Show Context)
Citation Context ...m for matrix vector multiplication include the workof Kolda [33] and an earlier report on this research [23]. The authors have recently become aware of a closely related workby Çatalyürek and Aykanat =-=[7]-=-. In their approach, the structure of the unsymmetric matrix is represented by a hypergraph in which rows are vertices and columns are hyperedges. A distinct advantage of their approach over ours is t... |

1 |
uma, A Comparative Study of Sparse Approximate
- Benzi, T
- 1998
(Show Context)
Citation Context ...eordering keeps the eigenvalues intact.) Generally, iterative methods involve preconditioning. Suppose we have an explicit preconditioner such as an approximate inverse M # A -1 . (See Benzi and Tuma =-=[3]-=- for a survey of approximate inverse preconditioners.) In that case, we need to find P and Q such that both PAQ and Q T MP T # (PAQ) -1 are well partitioned. By well partitioned we mean that (1) the c... |

1 |
Private communication
- Vasco, Marques
- 1998
(Show Context)
Citation Context ...we1998 matrix with 27,546,437 nonzeros is used in a geophysical application where a truncated SVD must be computed (see Vasco, Johnson, and Marques [43]); the matrix was provided by Vasco and Marques =-=[44]-=-. This matrix has 1672 dense columns and so was partitioned row-wise. Because of the size of the matrix, the problem was run on an SGI Onyx with two processors and six gigabytes of memory, so the timi... |