Results 1  10
of
15
On twodimensional sparse matrix partitioning: Models, methods, and a recipe
 SIAM J. Sci. Comput
, 2010
"... Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces f ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.
A Repartitioning Hypergraph Model for Dynamic Load Balancing
, 2008
"... In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load b ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load balancing (repartitioning) of the changed computational structure is required. Repartitioning differs from static load balancing (partitioning) due to the additional requirement of minimizing migration cost to move data from an existing partition to a new partition. In this paper, we present a novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data, in order to minimize the overall cost. Use of a hypergraphbased model allows us to accurately model communication costs rather than approximating them with graphbased models. We show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the Zoltan loadbalancing toolkit. To the best of our knowledge, this is the first implementation for dynamic load balancing based on hypergraph partitioning. To demonstrate the effectiveness of our approach, we conducted experiments on a Linux cluster with 1024 processors. The results show that, in terms of reducing total cost, our new model compares favorably to the graphbased dynamic load balancing approaches, and multilevel approaches improve the repartitioning quality significantly.
Partitioning, ordering, and load balancing in a hierarchically parallel hybrid linear solver
 International Journal of High Performance Computing Applications
, 2012
"... PDSLin is a generalpurpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to en ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
PDSLin is a generalpurpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to enhance PDSLin’s performance at this step. The first is a multiconstraint partitioning problem to balance the workload while computing the preconditioner in parallel. For this, we describe and evaluate a number of graph and hypergraph partitioning algorithms to satisfy our particular objective and constraints. The second problem is to reorder the sparse righthand side vectors to improve the data access locality during the parallel solution of a sparse triangular system with multiple righthand sides. This is to speed up the process of eliminating the unknowns associated with the interface. We study two reordering techniques: one based on a postordering of the elimination tree and the other based on a hypergraph partitioning. To demonstrate the effect of these techniques on the performance of PDSLin, we present the numerical results of solving largescale linear systems arising from two applications of our interest: numerical simulations of modeling accelerator cavities and of modeling fusion devices. 1
UMPa: A Multiobjective, multilevel partitioner for communication minimization
"... Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication m ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication metrics into account with a tiebreaking scheme. With this approach, task distributions causing an excessive use of network or a bottleneck processor which participates to almost all of the communication are avoided. We show on a large number of problem instances that our model improves the maximum data sent by a processor up to 34 % for parallel environments with 4, 16, 64 and 256 processing units compared to the state of the art which only minimizes the total communication volume.
Hypergraph Partitioning for Parallel Iterative Solution of General Sparse Linear Systems ∗
, 2007
"... The efficiency of parallel iterative methods for solving linear systems, arising from reallife applications, depends greatly on matrix characteristics and on the amount of parallel overhead. It is often viewed that a major part of this overhead can be caused by parallel matrixvector multiplications ..."
Abstract
 Add to MetaCart
The efficiency of parallel iterative methods for solving linear systems, arising from reallife applications, depends greatly on matrix characteristics and on the amount of parallel overhead. It is often viewed that a major part of this overhead can be caused by parallel matrixvector multiplications. However, for difficult large linear systems, the preconditioning operations needed to accelerate convergence are to be performed in parallel and may also incur substantial overhead. To obtain an efficient preconditioning, it is desirable to consider certain matrix numerical properties in the matrix partitioning process. In general, graph partitioners consider the nonzero structure of a matrix to balance the number of unknowns and to decrease communication volume among parts. The present work builds upon hypergraph partitioning techniques because of their ability to handle nonsymmetric and irregular structured matrices and because they correctly minimize communication volume. First, several hyperedge weight schemes are proposed to account for the numerical matrix property called diagonal dominance of rows and columns. Then, an algorithm for the independent partitioning of certain submatrices followed by the matching of the obtained parts is presented in detail along with a proof that it correctly minimizes the total communication volume. For the proposed variants of hypergraph partitioning models, numerical experiments compare the iterations to converge, investigate the diagonal dominance of the obtained parts, and show the values of the partitioning cost functions. 1
MODELS, METHODS, AND A RECIPE
, 2008
"... Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioning based methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces f ..."
Abstract
 Add to MetaCart
Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioning based methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics. Key words. Sparse matrix partitioning; parallel matrixvector multiplication; hypergraph partitioning; twodimensional partitioning; combinatorial scientific computing AMS subject classifications. 05C50, 05C65, 65F10, 65F50, 65Y05 1. Introduction. Sparse
A matrix partitioning interface to PaToH in MATLAB
, 2009
"... We present the PaToH MATLAB Matrix Partitioning Interface. The interface provides support for hypergraphbased sparse matrix partitioning methods which are used for efficient parallelization of sparse matrixvector multiplication operations. The interface also offers tools for visualizing and measur ..."
Abstract
 Add to MetaCart
We present the PaToH MATLAB Matrix Partitioning Interface. The interface provides support for hypergraphbased sparse matrix partitioning methods which are used for efficient parallelization of sparse matrixvector multiplication operations. The interface also offers tools for visualizing and measuring the quality of a given matrix partition. We propose a novel, multilevel, 2D coarseningbased 2D matrix partitioning method and implement it using the interface. We have performed extensive comparison of the proposed method against our implementation of orthogonal recursive bisection and finegrain methods on a large set of publicly available test matrices. The conclusion of the experiments is that the new method can compete with the finegrain method while also suggesting new research directions.
Hypergraph Partitioning for Computing Matrix Powers
, 2010
"... Motivation Krylov Subspace Methods (KSMs) are a class of iterative algorithms commonly used in scientific applications for solving linear systems, eigenvalue problems, singular value problems, and least squares. Standard KSMs are communicationbound, due to a sparse matrix vector multiplication (SpM ..."
Abstract
 Add to MetaCart
Motivation Krylov Subspace Methods (KSMs) are a class of iterative algorithms commonly used in scientific applications for solving linear systems, eigenvalue problems, singular value problems, and least squares. Standard KSMs are communicationbound, due to a sparse matrix vector multiplication (SpMV) in each iteration. This motivated the formulation of CommunicationAvoidingKSMs, which remove the communication bottleneck to increase performance. A successful strategy for avoiding communication in KSMs uses a matrix powers kernel that exploits locality in the graph of the system matrix A. The matrix powers kernel computes k basis vectors for a Krylov subspace (i.e., Kk(A, v) = span{v, Av,..., Ak−1v}) reading A only once. Since a standard KSM reads A once per iteration, this approach effectively reduces the communication cost by a factor of k [7, 8]. The current implementation of the matrix powers kernel [8] partitions the matrix A given the computed dependencies using graph partitioning of A + AT. However, the graph model inaccurately represents the communication volume in SpMV and is difficult to extend to the case of nonsymmetric matrices. A hypergraph model remedies these two problems for SpMV [2, 5, 3]. The fundamental similarity between SpMV and the matrix powers kernel motivates our decision to pursue a hypergraph communication model. Contribution We construct a hypergraph that encodes the matrix powers communication, and prove that a partition of this hypergraph corresponds exactly to the communication required when using the given partition
Scaling transactional workloads on the cloud
"... In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support databaseasaservice in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large nu ..."
Abstract
 Add to MetaCart
In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support databaseasaservice in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. Capturing and modeling the transactional workload over a period of time, and then exploiting that information for data placement and replication has been shown to provide significant benefits in performance, both in terms of transaction latencies and overall throughput. However, such workloadaware data placement approaches can incur very high overheads, and further, may perform worse than naive approaches if the workload changes. In this work, we propose SWORD, a scalable workloadaware data partitioning and placement approach for OLTP workloads, that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement, and during query execution at runtime. We model the workload as a hypergraph over the data items, and propose using a hypergraph compression technique to reduce the overheads of partitioning. We have built a workloadaware active replication mechanism in SWORD to increase availability and enable load balancing. We propose the use of finegrained quorums defined at the level of groups of tuples to control the cost of distributed updates, improve throughput, and provide adaptability to different workloads. To our knowledge, SWORD is the first system that uses finegrained quorums in this context. The results of our experimental evaluation on SWORD deployed on an Amazon EC2 cluster show that our techniques result in ordersofmagnitude reductions in the partitioning and bookkeeping overheads, and improve tolerance to failures and workload changes; we also show that choosing quorums based on the query access patterns enables us to better handle query workloads with different read and write access patterns.