Results 1  10
of
20
HypergraphPartitioning Based Decomposition for Parallel SparseMatrix Vector Multiplication
 IEEE Trans. on Parallel and Distributed Computing
"... In this work, we show that the standard graphpartitioning based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrixvector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph mo ..."
Abstract

Cited by 72 (35 self)
 Add to MetaCart
(Show Context)
In this work, we show that the standard graphpartitioning based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrixvector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental verification of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices confirm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63% less communication volume (30%38% less on the average) than the graph model using MeTiS, while PaToH is only 1.32.3 times slower than MeTiS on the average. ...
Mapping Algorithms and Software Environment for Data Parallel PDE . . .
 JOURNAL OF DISTRIBUTED AND PARALLEL COMPUTING
, 1994
"... We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization p ..."
Abstract

Cited by 39 (21 self)
 Add to MetaCart
(Show Context)
We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization problem. Its optimal solution is essential for the efficient parallel processing of PDE computations. Determining data mappings that optimize a number of criteria, likeworkload balance, synchronization and local communication, often involves the solution of an NPComplete problem. Although data mapping algorithms have been known for a few years there is lack of qualitative and quantitative comparisons based on the actual performance of the parallel computation. In this paper we present two new data mapping algorithms and evaluate them together with a large number of existing ones using the actual performance of data parallel iterative PDE solvers on the nCUBE II. Comparisons on the performance of data parallel iterative PDE solvers on medium and large scale problems demonstrate that some computationally inexpensive data block partitioning algorithms are as effective as the computationally expensive deterministic optimization algorithms. Also, these comparisons demonstrate that the existing approach in solving the data partitioning problem is inefficient for large scale problems. Finally, a software environment for the solution of the partitioning problem of data parallel iterative solvers is presented.
Architecture and implementation of memory channel 2
 Digital Technical Journal
, 1997
"... The MEMORY CHANNEL network is a dedicated cluster interconnect that provides virtual shared memory among nodes by means of internodal address space mapping. The interconnect implements direct userlevel messaging and guarantees strict message ordering under all conditions, including transmission err ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
The MEMORY CHANNEL network is a dedicated cluster interconnect that provides virtual shared memory among nodes by means of internodal address space mapping. The interconnect implements direct userlevel messaging and guarantees strict message ordering under all conditions, including transmission errors. These characteristics allow industrystandard communication interfaces and parallel programming paradigms to achieve much higher efficiency than on conventional networks. This paper presents an overview of the MEMORY CHANNEL network architecture and describes DIGITAL‘s crossbarbased implementation of the secondgeneration MEMORY CHANNEL network, MEMORY CHANNEL 2. This network provides bisection bandwidths of 1,000 to 2,000 megabytes per second and a sustained processtoprocess bandwidth of 88 megabytes per second. Oneway, processtoprocess message latency is less than 2.2 microseconds.
ObjectOriented Design of Preconditioned Iterative Methods in Diffpack
, 1996
"... As modern programming methodologies migrate from computer science to scientific computing... ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
As modern programming methodologies migrate from computer science to scientific computing...
Parallel Heuristics for Improved, Balanced Graph Colorings
 Journal of Parallel and Distributed Computing
, 1996
"... : The computation of good, balanced graph colorings is an essential part of many algorithms required in scientific and engineering applications. Motivated by an effective sequential heuristic, we introduce a new parallel heuristic, PLF, and show that this heuristic has the same expected runtime unde ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
: The computation of good, balanced graph colorings is an essential part of many algorithms required in scientific and engineering applications. Motivated by an effective sequential heuristic, we introduce a new parallel heuristic, PLF, and show that this heuristic has the same expected runtime under the PRAM computational model as the scalable coloring heuristic introduced by Jones and Plassmann (JP). We present experimental results performed on the Intel DELTA that demonstrate that this new heuristic consistently generates better colorings and requires only slightly more time than the JP heuristic. In the second part of the paper we introduce two new parallel colorbalancing heuristics, PDR(k) and PLF(k). We show that these heuristics have the desirable property that they do not increase the number of colors used by an initial coloring during the balancing process. We present experimental results that show that these heuristics are very effective in obtaining balanced colorings and, ...
Hypergraph Model for Mapping Repeated Sparse MatrixVector Product Computations onto Multicomputers
"... Graph model of computation has deficiencies in the mapping of repeated sparse matrixvector computations to multicomputers. We propose a hypergraph model of computation to avoid these deficiencies. We also propose onephase KernighanLin based mapping heuristics for the graph and hypergraph models. ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Graph model of computation has deficiencies in the mapping of repeated sparse matrixvector computations to multicomputers. We propose a hypergraph model of computation to avoid these deficiencies. We also propose onephase KernighanLin based mapping heuristics for the graph and hypergraph models. The proposed mapping heuristics are used for the experimental evaluation of the validity of the proposed hypergraph model on sparse matrices selected from HarwellBoeing collection and NETLIB suite. The proposed heuristic using the hypergraph model finds drastically better mappings than the one using the graph model on test matrices with unstructured sparsity pattern.
Developments and Trends in the Parallel Solution of Linear Systems
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.
A Comparison of Optimization Heuristics for the Data Mapping Problem
, 1997
"... In this paper we compare the performance of six heuristics with suboptimal solutions for the data distribution of two dimensional meshes that are used for the numerical solution of Partial Differential Equations (PDEs) on multicomputers. The data mapping heuristics are evaluated with respect to seve ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
In this paper we compare the performance of six heuristics with suboptimal solutions for the data distribution of two dimensional meshes that are used for the numerical solution of Partial Differential Equations (PDEs) on multicomputers. The data mapping heuristics are evaluated with respect to seven criteria covering load balancing, interprocessor communication, flexibility and ease of use for a class of singlephase iterative PDE solvers. Our evaluation suggests that the simple and fast block distribution heuristic can be as effective as the other five complex and computational expensive algorithms.
Automatic Partitioning Techniques for Solving Partial Differential Equations on Irregular Adaptive Meshes
, 1995
"... We present some original automatic partitioning techniques for irregular sparse matrices arising from FiniteElement discretizations of PDE. We discuss their efficiency in terms of parallel computation, especially from the point of view of adaptive applications, that need rebalancing after small ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We present some original automatic partitioning techniques for irregular sparse matrices arising from FiniteElement discretizations of PDE. We discuss their efficiency in terms of parallel computation, especially from the point of view of adaptive applications, that need rebalancing after small changes on the grid. Some parallel simulations are presented, along with practical experiments on a KSR and a SGIChallenger.