Results 1  10
of
29
On twodimensional sparse matrix partitioning: Models, methods, and a recipe
 SIAM J. Sci. Comput
, 2010
"... Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces f ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.
Revisiting hypergraph models for sparse matrix partitioning
 SIAM Review
, 2007
"... Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matri ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrixvector multiply, and the nets encode dependencies among the data. We then apply a recently proposed hypergraph transformation operation to devise models for 1Dsparse matrix partitioning. The resulting 1Dpartitioning models are equivalent to the previously proposed computational hypergraph models and are not meant to be replacements for them. Nevertheless, the new models give us insights into the previous ones and help us explain a subtle requirement, known as the consistency condition, of hypergraph partitioning models. Later, we demonstrate the flexibility of the elementary model on a few 1Dpartitioning problems that are hard to solve using the previously proposed models. We also discuss extensions of the proposed elementary model to twodimensional matrix partitioning. Key words. parallel computing, sparse matrixvector multiply, hypergraph models
Multilevel direct Kway hypergraph partitioning with multiple constraints and fixed vertices
, 2008
"... ..."
Partitioning sparse matrices for parallel preconditioned iterative methods
 SIAM Journal on Scientific Computing
, 2004
"... Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that differ ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.
HypergraphPartitioningBased Remapping Models for ImageSpaceParallel Direct Volume Rendering of Unstructured Grids
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2005
"... In this work, imagespaceparallel direct volume rendering (DVR) of unstructured grids is investigated for distributedmemory architectures. A hypergraphpartitioningbased model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the renderin ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
In this work, imagespaceparallel direct volume rendering (DVR) of unstructured grids is investigated for distributedmemory architectures. A hypergraphpartitioningbased model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the rendering loads of processors while trying to minimize the amount of data replication. In the parallel DVR framework we adopted, each data primitive is statically owned by its home processor, which is responsible from replicating its primitives on other processors. Two appropriate remapping models are proposed by enhancing the above model for use within this framework. These two remapping models aim to minimize the total volume of communication in data replication while balancing the rendering loads of processors. Based on the proposed models, a parallel DVR algorithm is developed. The experiments conducted on a PC cluster show that the proposed remapping models achieve better speedup values compared to the remapping models previously suggested for imagespaceparallel DVR.
On computing inverse entries of a sparse matrix in an outofcore environment
, 2010
"... Abstract. The inverse of an irreducible sparse matrix is structurally full, so that it is impractical to think of computing or storing it. However, there are several applications where a subset of the entries of the inverse is required. Given a factorization of the sparse matrix held in outofcore ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Abstract. The inverse of an irreducible sparse matrix is structurally full, so that it is impractical to think of computing or storing it. However, there are several applications where a subset of the entries of the inverse is required. Given a factorization of the sparse matrix held in outofcore storage, we show how to compute such a subset e ciently, by accessing only parts of the factors. When there are many inverse entries to compute, we need to guarantee that the overall computation scheme has reasonable memory requirements, while minimizing the cost of loading the factors. This leads to a partitioning problem that we prove is NPcomplete. We also show that we cannot get a close approximation to the optimal solution in polynomial time. We thus need to develop heuristic algorithms, and we propose: (i) a lower bound on the cost of an optimum solution; (ii) an exact algorithm for a particular case; (iii) two other heuristics for a more general case; and (iv) hypergraph partitioning models for the most general setting. We illustrate the performance of our algorithms in practice using the MUMPS software package on a set of reallife problems as well as some standard test matrices. We show that our techniques can improve the execution time by a factor of 50. Key words. Sparse matrices, direct methods for linear systems and matrix inversion, multifrontal method, graphs and hypergraphs. AMS subject classi cations. 05C50, 05C65, 65F05, 65F50 1. Introduction. We
Hypergraph partitioning for faster parallel PageRank computation
 LECTURE NOTES IN COMPUTER SCIENCE 3670
, 2005
"... The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded a ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded and as demand for usertailored web page ordering metrics has grown, scalable parallel computation of PageRank has become a focus of considerable research effort. In this paper, we seek a scalable problem decomposition for parallel PageRank computation, through the use of stateoftheart hypergraphbased partitioning schemes. These have not been previously applied in this context. We consider both one and twodimensional hypergraph decomposition models. Exploiting the recent availability of the Parkway 2.1 parallel hypergraph partitioner, we present empirical results on a gigabit PC cluster for three publicly available web graphs. Our results show that hypergraphbased partitioning substantially reduces communication volume over conventional partitioning schemes (by up to three orders of magnitude), while still maintaining computational load balance. They also show a halving of the periteration runtime cost when compared to the most effective alternative approach used to date.
Iterativeimprovementbased heuristics for adaptive scheduling of tasks sharing files on heterogeneous masterslave environments
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2006
"... The scheduling of independent but filesharing tasks on heterogeneous masterslave platforms has recently found important applications in Grid environments. The scheduling heuristics recently proposed for this problem are all constructive in nature and based on a common greedy criterion which depen ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
The scheduling of independent but filesharing tasks on heterogeneous masterslave platforms has recently found important applications in Grid environments. The scheduling heuristics recently proposed for this problem are all constructive in nature and based on a common greedy criterion which depends on the momentary completion time values of the tasks. We show that this greedy decision criterion has shortcomings in exploiting the filesharing interaction among tasks since completion time values are inadequate to extract the global view of this interaction. We propose a threephase scheduling approach which involves initial task assignment, refinement, and execution ordering phases. For the refinement phase, we model the target application as a hypergraph and, with an elegant hypergraphpartitioninglike formulation, we propose using iterativeimprovementbased heuristics for refining the task assignments according to two novel objective functions. Unlike the turnaround time, which is the actual schedule cost, the smoothness of proposed objective functions enables the use of iterativeimprovementbased heuristics successfully since their effectiveness and efficiency depend on the smoothness of the objective function. Experimental results on a wide range of synthetically generated heterogeneous masterslave frameworks show that the proposed threephase scheduling approach performs much better than the greedy constructive approach.
Parallel Multilevel Algorithms for Hypergraph Partitioning
, 2007
"... In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoeffi ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a stateoftheart serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multiphase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.