## A Parallel Matrix Scaling Algorithm

Citations: | 4 - 2 self |

### BibTeX

@MISC{Amestoy_aparallel,

author = {Patrick R. Amestoy and Iain S. Duff and Daniel Ruiz and Bora Uçar},

title = {A Parallel Matrix Scaling Algorithm },

year = {}

}

### OpenURL

### Abstract

We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of our implementation on a few computing environments.

### Citations

538 |
Direct Methods for Sparse Matrices
- Du, Erisman, et al.
- 1986
(Show Context)
Citation Context ...bject of several scientific publications, with many different developments depending on the properties required from the scaling. It has given rise to several well known algorithms; see, for example, =-=[10, 17]-=-. If we denote by Â the scaled matrix Â = D1AD2, we then solve the equation Âˆx = ˆ b, where ˆx = D −1 2 x and ˆ b = D1b. A standard and well known approach to scaling is to do a row or column scaling... |

303 | University of florida sparse matrix collection
- DAVIS, AL
- 1997
(Show Context)
Citation Context ...ectively. In both of the systems, the program is compiled with gcc using the optimization option -O3.8 We ran the program on a set of matrices from the University of Florida sparse matrix collection =-=[8]-=-. The characteristics of the matrices are shown in Table 1. Table 1. Matrices used in measuring the parallel performance, their size, number of nonzeros, and the number of iterations to converge in th... |

281 | BEOWULF: A parallel workstation for scientific computation
- Sterling, Becker, et al.
- 1995
(Show Context)
Citation Context ...riments We have implemented a parallel program for the proposed matrix scaling algorithm in C using LAM/MPI [3]. The experiments were carried out on up to 16 nodes of two PC clusters of Beowulf class =-=[19]-=-. In the first cluster, the nodes are Intel Pentium IV 2.6 GHz processors with 1GB of RAM, and they run Debian/GNU Linux. This cluster has a Gigabit Ethernet switch. The cluster has a measured latency... |

247 | Benchmarking optimization software with performance profiles
- Dolan, Moré
(Show Context)
Citation Context ...834 2 16 We have observed that usually 25–30 iterations of the discussed scaling algorithm is sufficient to improve the condition number of the matrices. We used the performance profiles discussed in =-=[9]-=- to generate the plot shown in Fig. 1. The plot compares estimates of the condition numbers for the scaled matrices resulting from four different scaling algorithms and those of the original matrices.... |

213 |
LAM: An Open Cluster Environment for MPI
- Burns, Daoud, et al.
- 1994
(Show Context)
Citation Context ... partitioning algorithm, as it is easy to implement and is fast to run in parallel. 7 4 Experiments We have implemented a parallel program for the proposed matrix scaling algorithm in C using LAM/MPI =-=[3]-=-. The experiments were carried out on up to 16 nodes of two PC clusters of Beowulf class [19]. In the first cluster, the nodes are Intel Pentium IV 2.6 GHz processors with 1GB of RAM, and they run Deb... |

73 | On algorithms for permuting large entries to the diagonal of a sparse matrix
- Duff, Koster
(Show Context)
Citation Context ...d be useful on a wide range of sparse matrices. There is also the routine MC30 in HSL that is a variant of the MC29 routine for symmetric matrices. Scaling can also be combined with permutations; see =-=[11]-=- and the HSL routine MC64. In this approach, the matrix is first permuted so that the product of absolute values of entries on the diagonal of the permuted matrix is maximized (other measures such as ... |

65 | A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
- Vastenhouw, Bisseling
(Show Context)
Citation Context ...mputations y ← Ax followed by x ← A T y, when the partitions on x and y are equal 56 to the partitions on D2 and D1, respectively. Having observed that, we can use hypergraph models, see for example =-=[4, 20, 22]-=-, to partition the matrix A, and then follow the above development to partition D1 and D2 to achieve efficient parallelization. Moreover, due to the equivalence between the communication operations of... |

61 | Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication
- Çatalyürek, Aykanat
- 1999
(Show Context)
Citation Context ...cj). It can be seen from Eq. (4) that the communication volume requirements of the proposed algorithm are closely related to those of repeated sparse matrixvector multiply operations; see for example =-=[4, 12]-=-. In fact, the communication operations in an iteration of Algorithm 1 are the same as those in the computations y ← Ax followed by x ← A T y, when the partitions on x and y are equal 56 to the parti... |

46 |
PaToH: A multilevel hypergraph partitioning tool, version 3.0
- Çatalyürek, Aykanat
- 1999
(Show Context)
Citation Context ...e the average running time of an iteration, we ran the program for 1000 iterations, without testing convergence. We used the fine-grain hypergraph model [6] and the hypergraph partitioning tool PaToH =-=[5]-=- with default options to partition the matrices. In the fine-grain model, the nonzeros of a matrix are partitioned independently, i.e., nonzeros in a row or a column are not necessarily assigned to a ... |

39 |
Concerning nonnegative matrices and doubly stochastic matrices
- Sinkhorn, Knopp
- 1967
(Show Context)
Citation Context ... not the case for most scaling algorithms which alternately scale rows followed by columns or vice-versa. In the case of unsymmetric matrices, one may consider the use of the SinkhornKnopp iterations =-=[18]-=- with the ∞-norm in place of the 1-norm. This method simply normalizes all rows and then columns in A, and iterates on this process until convergence. In the ∞-norm, this is obtained after a single st... |

35 | Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies
- Uçar, Aykanat
- 2004
(Show Context)
Citation Context ...mputations y ← Ax followed by x ← A T y, when the partitions on x and y are equal 56 to the partitions on D2 and D1, respectively. Having observed that, we can use hypergraph models, see for example =-=[4, 20, 22]-=-, to partition the matrix A, and then follow the above development to partition D1 and D2 to achieve efficient parallelization. Moreover, due to the equivalence between the communication operations of... |

30 |
A comparative study of algorithms for matrix balancing
- Schneider, Zenios
- 1990
(Show Context)
Citation Context ...bject of several scientific publications, with many different developments depending on the properties required from the scaling. It has given rise to several well known algorithms; see, for example, =-=[10, 17]-=-. If we denote by Â the scaled matrix Â = D1AD2, we then solve the equation Âˆx = ˆ b, where ˆx = D −1 2 x and ˆ b = D1b. A standard and well known approach to scaling is to do a row or column scaling... |

28 | A fine-grain hypergraph model for 2D decomposition of sparse matrices
- Çatalyürek, Aykanat
- 2001
(Show Context)
Citation Context ...roach will partition D1 and D2 in such a way that the processor which holds aii will own d1(i) and d2(i). This is the common approach taken in standard matrix partitioning approaches, see for example =-=[4, 6]-=-. The MPI standard [14] defines an operation, minloc, which can be used to accomplish this task. In general, minloc can be used to compute a global minimum and the rank of the process whose data conta... |

25 |
On the automatic scaling of matrices for Gaussian elimination
- Curtis, Reid
- 1972
(Show Context)
Citation Context ...HSL [13] routine MC29, which aims to make the nonzeros of the scaled matrix close to one by minimizing the sum of the squares of the logarithms of the moduli of the nonzeros in the scaled matrix; see =-=[7]-=-. MC29 reduces this sum in a global sense and therefore should be useful on a wide range of sparse matrices. There is also the routine MC30 in HSL that is a variant of the MC29 routine for symmetric m... |

20 | On the scalability of hypergraph models for sparse matrix partitioning
- Uçar, Çatalyürek
- 2010
(Show Context)
Citation Context ...e to the equivalence between the communication operations of the proposed algorithm and those of sparse matrix-vector multiply operations, we can adopt the vector partitioning techniques discussed in =-=[1, 21]-=- to partition D1 and D2. We wanted to have a parallelization of the scaling algorithm independent of the matrix partitioning. This is because we imagine the use of the algorithm in a parallel linear s... |

17 |
Communication balancing in parallel sparse matrix-vector multiplication
- Bisseling, Meesen
(Show Context)
Citation Context ...e to the equivalence between the communication operations of the proposed algorithm and those of sparse matrix-vector multiply operations, we can adopt the vector partitioning techniques discussed in =-=[1, 21]-=- to partition D1 and D2. We wanted to have a parallelization of the scaling algorithm independent of the matrix partitioning. This is because we imagine the use of the algorithm in a parallel linear s... |

16 | A scaling algorithm to equilibrate both rows and columns norms in matrices
- Ruiz
- 2001
(Show Context)
Citation Context ...d columns of the scaled matrix Â = D1AD2 have the same magnitude in some norm. Two common choices for the norm are the ∞- and the 1-norm. Recently, we proposed an iterative algorithm for this purpose =-=[16]-=-. In this paper, we present the algorithm briefly and discuss how we parallelize it. We report experimental results with the parallel code on three parallel systems that have different processors and ... |

13 | Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing
- Hendrickson, Kolda
(Show Context)
Citation Context ...cj). It can be seen from Eq. (4) that the communication volume requirements of the proposed algorithm are closely related to those of repeated sparse matrixvector multiply operations; see for example =-=[4, 12]-=-. In fact, the communication operations in an iteration of Algorithm 1 are the same as those in the computations y ← Ax followed by x ← A T y, when the partitions on x and y are equal 56 to the parti... |

10 | Scaling matrices to prescribed row and column maxima - Rothblum, Schneider, et al. - 1994 |

7 |
Equilibration of symmetric matrices in the max-norm
- Bunch
- 1971
(Show Context)
Citation Context ...4 4.5 5 τ 2 1 inf B A Fig. 1. Performance profiles for the condition number estimates for 245 matrices. A marks the condition number estimate of the original matrix; B marks that of Bunch’s algorithm =-=[2]-=-; inf, 1, and 2 mark that of the parallel scaling algorithm with ∞-, 1-, and 2-norms (with at most 25 iterations). At, for example τ = 3, the curves from top to bottom correspond to the labels given i... |

1 |
A fine-grain hypergraph model for 2d decomposition of sparse matrices
- BU-CE-9915
- 1999
(Show Context)
Citation Context ...roach will partition D1 and D2 in such a way that the processor which holds aii will own d1(i) and d2(i). This is the common approach taken in standard matrix partitioning approaches, see for example =-=[4, 6]-=-. Thirdly, we believe that the algorithm is likely to achieve a balance on the number of D1 and D26 AMESTOY ET AL. Table 4.1 Matrices used in measuring the parallel performance, their size, number of... |