Results 1  10
of
23
On twodimensional sparse matrix partitioning: Models, methods, and a recipe
 SIAM J. SCI. COMPUT
, 2010
"... We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain ..."
Abstract

Cited by 37 (21 self)
 Add to MetaCart
(Show Context)
We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.
A Repartitioning Hypergraph Model for Dynamic Load Balancing
, 2008
"... In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load b ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load balancing (repartitioning) of the changed computational structure is required. Repartitioning differs from static load balancing (partitioning) due to the additional requirement of minimizing migration cost to move data from an existing partition to a new partition. In this paper, we present a novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data, in order to minimize the overall cost. Use of a hypergraphbased model allows us to accurately model communication costs rather than approximating them with graphbased models. We show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the Zoltan loadbalancing toolkit. To the best of our knowledge, this is the first implementation for dynamic load balancing based on hypergraph partitioning. To demonstrate the effectiveness of our approach, we conducted experiments on a Linux cluster with 1024 processors. The results show that, in terms of reducing total cost, our new model compares favorably to the graphbased dynamic load balancing approaches, and multilevel approaches improve the repartitioning quality significantly.
UMPa: A Multiobjective, multilevel partitioner for communication minimization
"... Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication m ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication metrics into account with a tiebreaking scheme. With this approach, task distributions causing an excessive use of network or a bottleneck processor which participates to almost all of the communication are avoided. We show on a large number of problem instances that our model improves the maximum data sent by a processor up to 34 % for parallel environments with 4, 16, 64 and 256 processing units compared to the state of the art which only minimizes the total communication volume.
A LinkBased Storage Scheme for Efficient Aggregate Query Processing on Clustered Road Networks
 INFORMATION SYSTEMS
, 2009
"... ..."
Integrated data placement and task assignment for scientific workflows in clouds[C]//Proceedings of the fourth international workshop on Dataintensive distributed computing
 ACM
"... We consider the problem of optimizing the execution of dataintensive scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are as ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimizing the execution of dataintensive scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are assigned on different execution sites, a file transfer is necessary. The output files are to be stored at a site. Each execution site is to be assigned a certain percentage of the files and tasks. These percentages, called target weights, are predetermined and reflect either user preferences or the storage capacity and computing power of the sites. The aim is to place the data files into and assign the tasks to the execution sites so as to reduce the cost associated with the file transfers, while complying with the target weights. To do this, we model the workflow as a hypergraph and with a hypergraphpartitioningbased formulation, we propose a heuristic which generates data placement and task assignment schemes simultaneously. We report simulation results on a number of reallife and synthetically generated scientific workflows. Our results show that the proposed heuristic is fast, and can find mappings and assignments which reduce file transfers, while respecting the target weights.
SWORD: Scalable WorkloadAware Data Placement for
"... In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support databaseasaservice in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large nu ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support databaseasaservice in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. Capturing and modeling the transactional workload over a period of time, and then exploiting that information for data placement and replication has been shown to provide significant benefits in performance, both in terms of transaction latencies and overall throughput. However, such workloadaware data placement approaches can incur very high overheads, and further, may perform worse than naive approaches if the workload changes.
Hypergraph Partitioning for Computing Matrix Powers
, 2010
"... Motivation Krylov Subspace Methods (KSMs) are a class of iterative algorithms commonly used in scientific applications for solving linear systems, eigenvalue problems, singular value problems, and least squares. Standard KSMs are communicationbound, due to a sparse matrix vector multiplication (SpM ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Motivation Krylov Subspace Methods (KSMs) are a class of iterative algorithms commonly used in scientific applications for solving linear systems, eigenvalue problems, singular value problems, and least squares. Standard KSMs are communicationbound, due to a sparse matrix vector multiplication (SpMV) in each iteration. This motivated the formulation of CommunicationAvoidingKSMs, which remove the communication bottleneck to increase performance. A successful strategy for avoiding communication in KSMs uses a matrix powers kernel that exploits locality in the graph of the system matrix A. The matrix powers kernel computes k basis vectors for a Krylov subspace (i.e., Kk(A, v) = span{v, Av,..., Ak−1v}) reading A only once. Since a standard KSM reads A once per iteration, this approach effectively reduces the communication cost by a factor of k [7, 8]. The current implementation of the matrix powers kernel [8] partitions the matrix A given the computed dependencies using graph partitioning of A + AT. However, the graph model inaccurately represents the communication volume in SpMV and is difficult to extend to the case of nonsymmetric matrices. A hypergraph model remedies these two problems for SpMV [2, 5, 3]. The fundamental similarity between SpMV and the matrix powers kernel motivates our decision to pursue a hypergraph communication model. Contribution We construct a hypergraph that encodes the matrix powers communication, and prove that a partition of this hypergraph corresponds exactly to the communication required when using the given partition
Partitioning, ordering, and load balancing in a hierarchically parallel hybrid linear solver
 International Journal of High Performance Computing Applications
, 2012
"... PDSLin is a generalpurpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to en ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
PDSLin is a generalpurpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to enhance PDSLin’s performance at this step. The first is a multiconstraint partitioning problem to balance the workload while computing the preconditioner in parallel. For this, we describe and evaluate a number of graph and hypergraph partitioning algorithms to satisfy our particular objective and constraints. The second problem is to reorder the sparse righthand side vectors to improve the data access locality during the parallel solution of a sparse triangular system with multiple righthand sides. This is to speed up the process of eliminating the unknowns associated with the interface. We study two reordering techniques: one based on a postordering of the elimination tree and the other based on a hypergraph partitioning. To demonstrate the effect of these techniques on the performance of PDSLin, we present the numerical results of solving largescale linear systems arising from two applications of our interest: numerical simulations of modeling accelerator cavities and of modeling fusion devices. 1
COMMUNICATION AVOIDING ILU0 PRECONDITIONER∗
"... Abstract. In this paper we present a communication avoiding ILU0 preconditioner for solving large linear systems of equations by using iterative Krylov subspace methods. Recent research has focused on communication avoiding Krylov subspace methods based on socalled sstep methods. However, there ar ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we present a communication avoiding ILU0 preconditioner for solving large linear systems of equations by using iterative Krylov subspace methods. Recent research has focused on communication avoiding Krylov subspace methods based on socalled sstep methods. However, there are not many communication avoiding preconditioners yet, and this represents a serious limitation of these methods. Our preconditioner allows us to perform s iterations of the iterative method with no communication, through ghosting some of the input data and performing redundant computation. To avoid communication, an alternating reordering algorithm is introduced for structured and well partitioned unstructured matrices, which requires the input matrix to be ordered by using a graph partitioning technique such as kway or nested dissection. We show that the reordering does not affect the convergence rate of the ILU0 preconditioned system as compared to kway or nested dissection ordering, while it reduces data movement and is expected to reduce the time needed to solve a linear system. In addition to communication avoiding Krylov subspace methods, our preconditioner can be used with classical methods such as GMRES to reduce communication.
A matrix partitioning interface to PaToH in MATLAB
, 2009
"... We present the PaToH MATLAB Matrix Partitioning Interface. The interface provides support for hypergraphbased sparse matrix partitioning methods which are used for efficient parallelization of sparse matrixvector multiplication operations. The interface also offers tools for visualizing and measur ..."
Abstract
 Add to MetaCart
We present the PaToH MATLAB Matrix Partitioning Interface. The interface provides support for hypergraphbased sparse matrix partitioning methods which are used for efficient parallelization of sparse matrixvector multiplication operations. The interface also offers tools for visualizing and measuring the quality of a given matrix partition. We propose a novel, multilevel, 2D coarseningbased 2D matrix partitioning method and implement it using the interface. We have performed extensive comparison of the proposed method against our implementation of orthogonal recursive bisection and finegrain methods on a large set of publicly available test matrices. The conclusion of the experiments is that the new method can compete with the finegrain method while also suggesting new research directions.