Results 1  10
of
273
Spectral Partitioning Works: Planar graphs and finite element meshes
 In IEEE Symposium on Foundations of Computer Science
, 1996
"... Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extr ..."
Abstract

Cited by 201 (10 self)
 Add to MetaCart
Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on boundeddegree planar graphs and finite element meshes the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( p n) for boundeddegree planar graphs and twodimensional meshes and O i n 1=d j for wellshaped ddimensional meshes. The heart of our analysis is an upper bound on the secondsmallest eigenvalues of the Laplacian matrices of these graphs. 1. Introduction Spectral partitioning has become one of the mos...
NAMD: Biomolecular Simulation on Thousands of Processors
 In Proceedings of SC 2002
, 2002
"... NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object ..."
Abstract

Cited by 113 (34 self)
 Add to MetaCart
(Show Context)
NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an objectbased hybrid force and spatial decomposition scheme and an aggressive measurementbased predictive load balancing framework. We extend this work by demonstrating similar scaling on the much faster processors of the PSC Lemieux Alpha cluster, and for simulations employing efficient (order N log N) particle mesh Ewald full electrostatics.
How Good is Recursive Bisection?
 SIAM J. Sci. Comput
, 1995
"... . The most commonly used pway partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Becaus ..."
Abstract

Cited by 109 (5 self)
 Add to MetaCart
(Show Context)
. The most commonly used pway partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Because the optimal bisection problem, that partitions a graph into two equal sized subgraphs to minimize the number of edges cut, is NPcomplete, practical RB algorithms use more efficient heuristics in place of an optimal bisection algorithm. Most such heuristics are designed to find the best possible bisection within allowed time. We show that the recursive bisection method, even when an optimal bisection algorithm is assumed, may produce a pway partition that is very far way from the optimal one. Our negative result is complemented by two positive ones: First we show that for some important classes of graphs that occur in practical applications, such as wellshaped finite element and finite difference...
Graph partitioning for high performance scientific simulations. Computing Reviews 45(2
, 2004
"... ..."
(Show Context)
A TwoDimensional Data Distribution Method For Parallel Sparse MatrixVector Multiplication
 SIAM REVIEW
"... A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract

Cited by 86 (9 self)
 Add to MetaCart
(Show Context)
A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to onedimensional methods, and in general a good balance in the communication work.
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework ..."
Abstract

Cited by 86 (1 self)
 Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Combining Simulated Annealing with Local Search Heuristics
, 1993
"... We introduce a metaheuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search tech ..."
Abstract

Cited by 80 (7 self)
 Add to MetaCart
We introduce a metaheuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search techniques into simulated annealing so that the chain explores only local optima. It makes large, global changes, even at low temperatures, thus overcoming large barriers in configuration space. We have tested this metaheuristic for the traveling salesman and graph partitioning problems. Tests on instances from public libraries and random ensembles quantify the power of the method. Our algorithm is able to solve large instances to optimality, improving upon state of the art local search methods very significantly. For the traveling salesman problem with randomly distributed cities in a square, the procedure improves on 3opt by 1.6%, and on LinKernighan local search by 1.3%. For the partitioni...
Adaptive Local Refinement with Octree LoadBalancing for the Parallel Solution of ThreeDimensional Conservation Laws
 J. Parallel Distrib. Comput
, 1997
"... Conservation laws ae solved by a local Gaerkin finite element procedure with adapfive spacetime mesh refinement ad explicit time integration. The Courat stability condition is used to select smaller time steps on smaller elements of the mesh, thereby greatly increasing efficiency relative to method ..."
Abstract

Cited by 68 (17 self)
 Add to MetaCart
Conservation laws ae solved by a local Gaerkin finite element procedure with adapfive spacetime mesh refinement ad explicit time integration. The Courat stability condition is used to select smaller time steps on smaller elements of the mesh, thereby greatly increasing efficiency relative to methods having a single global time step. Processor load imbalaces, introduced at adaptive enrichment steps, are corrected by using traversals of an octtee representing a spatial decomposition of the domain. To accommodate the variable time steps, octtee partitioning is extended to use weights derived from element size. Partition boundary smoothing reduces the communications volume of partitioning procedures for a modest cost. Computational results comparing parallel octtee ad inertial partitioning procedures ae presented for the threedimensional Euler equations of compressible flow solved on an IBM SP2 computer.
Runtime and compiletime support for adaptive irregular problems
 SUPERCOMPUTINGâ€™94
, 1994
"... In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This rese ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
(Show Context)
In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This research presents efficient runtime primitives for such problems. This new set of primitives is part of the CHAOS library. It subsumes the previous PARTI library which targeted only static irregular problems. To demonstrate the efficacy of the runtime support, two real adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a particleincell code (DSMC). The paper also proposes extensions to Fortran D which can allow compilers to generate more efficient code for adaptive problems. These language extensions have been implemented in the Syracuse Fortran 90D/HPF prototype compiler. The performance of the compiler parallelized codes is compared with the hand parallelized versions.
Dynamic Partitioning of NonUniform Structured Workloads with Spacefilling Curves
 IEEE Transactions on Parallel and Distributed Systems
, 1995
"... We discuss Inverse Spacefilling Partitioning (ISP), a partitioning strategy for nonuniform scientific computations running on distributed memory MIMD parallel computers. We consider the case of a dynamic workload distributed on a uniform mesh, and compare ISP against Orthogonal Recursive Bisectio ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
We discuss Inverse Spacefilling Partitioning (ISP), a partitioning strategy for nonuniform scientific computations running on distributed memory MIMD parallel computers. We consider the case of a dynamic workload distributed on a uniform mesh, and compare ISP against Orthogonal Recursive Bisection (ORB) and a Median of Medians variant of ORB, ORBMM. We present two results. First, ISP and ORBMM are superior to ORB in rendering balanced workloadsbecause they are more finegrained and incur communication overheads that are comparable to ORB. Second, ISP is more attractive than ORBMM from a software engineering standpoint because it avoids elaborate bookkeeping. Whereas ISP partitionings can be described succinctly as logically contiguous segments of the line, ORBMM's partitionings are inherently unstructured. We describe the general ddimensional ISP algorithm and report empirical results with two and threedimensional, nonhierarchical particle methods. Scott B. Bad...