Results 1 - 10
of
168
Spectral Partitioning Works: Planar graphs and finite element meshes
- In IEEE Symposium on Foundations of Computer Science
, 1996
"... Spectral partitioning methods use the Fiedler vector---the eigenvector of the secondsmallest eigenvalue of the Laplacian matrix---to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extr ..."
Abstract
-
Cited by 124 (6 self)
- Add to MetaCart
Spectral partitioning methods use the Fiedler vector---the eigenvector of the secondsmallest eigenvalue of the Laplacian matrix---to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on bounded-degree planar graphs and finite element meshes--- the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( p n) for bounded-degree planar graphs and two-dimensional meshes and O i n 1=d j for well-shaped d-dimensional meshes. The heart of our analysis is an upper bound on the second-smallest eigenvalues of the Laplacian matrices of these graphs. 1. Introduction Spectral partitioning has become one of the mos...
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Combining Simulated Annealing with Local Search Heuristics
, 1993
"... We introduce a meta-heuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search tech ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
We introduce a meta-heuristic to combine simulated annealing with local search methods for CO problems. This new class of Markov chains leads to significantly more powerful optimization methods than either simulated annealing or local search. The main idea is to embed deterministic local search techniques into simulated annealing so that the chain explores only local optima. It makes large, global changes, even at low temperatures, thus overcoming large barriers in configuration space. We have tested this meta-heuristic for the traveling salesman and graph partitioning problems. Tests on instances from public libraries and random ensembles quantify the power of the method. Our algorithm is able to solve large instances to optimality, improving upon state of the art local search methods very significantly. For the traveling salesman problem with randomly distributed cities in a square, the procedure improves on 3-opt by 1.6%, and on Lin-Kernighan local search by 1.3%. For the partitioni...
How Good is Recursive Bisection?
- SIAM J. Sci. Comput
, 1995
"... . The most commonly used p-way partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Because the opti ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
. The most commonly used p-way partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal sized pieces, by a "good" bisection algorithm, and then recursively divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Because the optimal bisection problem, that partitions a graph into two equal sized subgraphs to minimize the number of edges cut, is NP-complete, practical RB algorithms use more efficient heuristics in place of an optimal bisection algorithm. Most such heuristics are designed to find the best possible bisection within allowed time. We show that the recursive bisection method, even when an optimal bisection algorithm is assumed, may produce a p-way partition that is very far way from the optimal one. Our negative result is complemented by two positive ones: First we show that for some important classes of graphs that occur in practical applications, such as well-shaped finite element and finite difference...
Runtime support and compilation methods for user-specified irregular data distributions
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph c ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connec-tivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed infor-mation from inspectors (e.g., communication schedules, loop it-eration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a For-tran 90D compiler implementation.
Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves
- IEEE Transactions on Parallel and Distributed Systems
, 1995
"... We discuss Inverse Spacefilling Partitioning (ISP), a partitioning strategy for nonuniform scientific computations running on distributed memory MIMD parallel computers. We consider the case of a dynamic workload distributed on a uniform mesh, and compare ISP against Orthogonal Recursive Bisectio ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
We discuss Inverse Spacefilling Partitioning (ISP), a partitioning strategy for nonuniform scientific computations running on distributed memory MIMD parallel computers. We consider the case of a dynamic workload distributed on a uniform mesh, and compare ISP against Orthogonal Recursive Bisection (ORB) and a Median of Medians variant of ORB, ORB-MM. We present two results. First, ISP and ORB-MM are superior to ORB in rendering balanced workloads---because they are more finegrained ---and incur communication overheads that are comparable to ORB. Second, ISP is more attractive than ORB-MM from a software engineering standpoint because it avoids elaborate bookkeeping. Whereas ISP partitionings can be described succinctly as logically contiguous segments of the line, ORB-MM's partitionings are inherently unstructured. We describe the general d-dimensional ISP algorithm and report empirical results with two- and three-dimensional, non-hierarchical particle methods. Scott B. Bad...
Run-time and compile-time support for adaptive irregular problems
- SUPERCOMPUTING’94
, 1994
"... In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This rese ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This research presents efficient runtime primitives for such problems. This new set of primitives is part of the CHAOS library. It subsumes the previous PARTI library which targeted only static irregular problems. To demonstrate the efficacy of the runtime support, two real adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a particle-in-cell code (DSMC). The paper also proposes extensions to Fortran D which can allow compilers to generate more efficient code for adaptive problems. These language extensions have been implemented in the Syracuse Fortran 90D/HPF prototype compiler. The performance of the compiler parallelized codes is compared with the hand parallelized versions.
Graph Partitioning for High Performance Scientific Simulations
, 2000
"... Contents 0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 Modeling Mesh-based Computations as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.3 Static Graph Partitioning Techniques . . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 48 (5 self)
- Add to MetaCart
Contents 0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 Modeling Mesh-based Computations as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.3 Static Graph Partitioning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 0.3.1 Geometric Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 0.3.2 Combinatorial Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 0.3.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 0.3.4 Multilevel Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 0.3.5 Combined Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 0.3.6 Qualitative Comparison of Graph Partitioning Schemes . . . . . . . . . . . . . . . . . 16 0.4 Load Balancing of Adaptive Computations . . . . . .
NAMD: Biomolecular Simulation on Thousands of Processors
- In Proceedings of SC 2002
, 2002
"... NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object-based hybrid force and spatial decomposition scheme and an aggressive measurement-based predictive load balancing framework. We extend this work by demonstrating similar scaling on the much faster processors of the PSC Lemieux Alpha cluster, and for simulations employing efficient (order N log N) particle mesh Ewald full electrostatics.
A Robust Parallel Programming Model for Dynamic Non-Uniform Scientific Computations
- IN PROCEEDINGS OF THE 1994 SCALABLE HIGH PERFORMANCE COMPUTING CONFERENCE
, 1994
"... LPARX provides efficient run-time support for dynamic, non-uniform scientific calculations running on MIMD distributed memory architectures. It extends HPF's data decomposition model to provide support for dynamic, block irregular data structures. LPARX represents data decompositions as first-class ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
LPARX provides efficient run-time support for dynamic, non-uniform scientific calculations running on MIMD distributed memory architectures. It extends HPF's data decomposition model to provide support for dynamic, block irregular data structures. LPARX represents data decompositions as first-class objects and expresses data dependencies in a manner which is logically independent of data decomposition and problem dimension. LPARX applications are portable across a diversity of MIMD machines. We have implemented a number of applications in LPARX--- including a 3d particle calculation and 2d and 3d adaptive multigrid solvers---which could not have been efficiently implemented in HPF.

