Results 1 
8 of
8
HYPERGRAPHBASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION
"... Abstract. In this paper we present HUND, a hypergraphbased unsymmetric nested dissection ordering algorithm for reducing the fillin incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. In this paper we present HUND, a hypergraphbased unsymmetric nested dissection ordering algorithm for reducing the fillin incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It takes into account the assymetry of the input matrix by using a hypergraph to represent its structure. It is suitable for performing Gaussian elimination in parallel, with partial pivoting. This is possible because the row permutations performed due to partial pivoting do not destroy the column separators identified by the nested dissection approach. Experimental results on 27 medium and large size highly unsymmetric matrices compare HUND to four other wellknown reordering algorithms. The results show that HUND provides a robust reordering algorithm, in the sense that it is the best or close to the best (often within 10%) of all the other methods.
Parallel Community Detection for Massive Graphs
"... Abstract. Tackling the current volume of graphstructured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and ne ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract. Tackling the current volume of graphstructured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore’s sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output’s modularity compares well with the standard sequential algorithms.
Combinatorial problems in solving linear systems
, 2009
"... Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today’s numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices.
A GPU algorithm for greedy graph matching
 Proc. FMC II, LNCS
, 2012
"... Abstract. Greedy graph matching provides us with a fast way to coarsen a graph during graph partitioning. Direct algorithms on the CPU which perform such greedy matchings are simple and fast, but offer few handholds for parallelisation. To remedy this, we introduce a finegrained sharedmemory paral ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. Greedy graph matching provides us with a fast way to coarsen a graph during graph partitioning. Direct algorithms on the CPU which perform such greedy matchings are simple and fast, but offer few handholds for parallelisation. To remedy this, we introduce a finegrained sharedmemory parallel algorithm for maximal greedy matching, together with an implementation on the GPU, which is faster (speedups up to 6.8 for random matching and 5.6 for weighted matching) than the serial CPU algorithms and produces matchings of similar (random matching) or better (weighted matching) quality. 1
Parallel Greedy Graph Matching using an Edge Partitioning Approach
"... We present a parallel version of the KarpSipser graph matching heuristic for the maximum cardinality problem. It is bulksynchronous, separating computation and communication, and uses an edgebased partitioning of the graph, translated from a twodimensional partitioning of the corresponding adjacen ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present a parallel version of the KarpSipser graph matching heuristic for the maximum cardinality problem. It is bulksynchronous, separating computation and communication, and uses an edgebased partitioning of the graph, translated from a twodimensional partitioning of the corresponding adjacency matrix. It is shown that the communication volume of Karp–Sipser graph matching is proportional to that of parallel sparse matrix–vector multiplication (SpMV), so that efficient partitioners developed for SpMV can be used. The algorithm is presented using a small basic set of 7 message types, which are discussed in detail. Experimental results show that for most matrices, edgebased partitioning is superior to vertexbased partitioning, in terms of both parallel speedup and matching quality. Good speedups are obtained on up to 64 processors.
Maximum Weighted Matching Using the Partitioned Global Address Space Model
"... Efficient parallel algorithms for problems such as maximum weighted matching are central to many areas of combinatorial scientific computing. Manne and Bisseling [13] presented a parallel approximation algorithm which is well suited to distributed memory computers. This algorithm is based on a distr ..."
Abstract
 Add to MetaCart
Efficient parallel algorithms for problems such as maximum weighted matching are central to many areas of combinatorial scientific computing. Manne and Bisseling [13] presented a parallel approximation algorithm which is well suited to distributed memory computers. This algorithm is based on a distributed protocol due to Hoepman [9]. In the current paper, a partitioned global address space (PGAS) implementation is presented. PGAS programmers have the conveniences of using a shared memory model, which provides implicit communication between processes using normal loads and stores. Since the shared memory is partitioned according to the affinity of a process, one is also able to exploit data locality. This paper addresses the main differences between the PGAS and MPI implementations of the ManneBisseling algorithm. It highlights some advantages of using the PGAS model such as shorter, simpler code, similarity to the sequential algorithm, and options for finegrained and coarsegrained communication. 1.