Results 1 - 10
of
36
A Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems
- Experience
, 1994
"... Unstructured meshes are used in many large-scale scientific and engineering problems, including finite-volume methods for computational fluid dynamics and finite-element methods for structural analysis. If unstructured problems such as these are to be solved on distributed-memory parallel computers, ..."
Abstract
-
Cited by 254 (7 self)
- Add to MetaCart
Unstructured meshes are used in many large-scale scientific and engineering problems, including finite-volume methods for computational fluid dynamics and finite-element methods for structural analysis. If unstructured problems such as these are to be solved on distributed-memory parallel computers, their data structures must be partitioned and distributed across processors; if they are to be solved efficiently, the partitioning must maximize load balance and minimize interprocessor communication. Recently the recursive spectral bisection method (RSB) has been shown to be very effective for such partitioning problems compared to alternative methods. Unfortunately, RSB in its simplest form is rather expensive. In this report we shall describe a multilevel implementation of RSB that can attain about an order-of-magnitude improvement in run time on typical examples. Keywords: graph partitioning, domain decomposition, MIMD machines, multilevel algorithm, spectral bisection, sp...
A Massively Parallel Adaptive Finite Element Method with Dynamic Load Balancing
- Appl. Numer. Math
, 1993
"... We construct massively parallel adaptive finite element methods for the solution of hyperbolic conservation laws. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta meth ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
We construct massively parallel adaptive finite element methods for the solution of hyperbolic conservation laws. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. We present results using adaptive-refinement to reduce the computational cost of the method, and tiling, a dynamic, element-based data migration system that maintains global load balance of the adaptive method by overlapping neighborhoods of processors that each perform local balancing. 1. Introduction We are studying massively parallel adaptive finite element methods for solving systems of-dimensional hyper...
Parallel Algorithms for Dynamically Partitioning Unstructured Grids
, 1995
"... Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is at ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load--balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of the algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed. 1 Introduction Considerable effort has been expended to develop fast, effective algorithms that split unstructured grids into equal--sized partitions so as to minimize communication overhead on parallel ma...
Parallel Decomposition of Unstructured FEM-Meshes
- Concurrency: Practice & Experience
, 1995
"... . We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the pr ..."
Abstract
-
Cited by 38 (14 self)
- Add to MetaCart
. We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains). The parallel parts of the method are implemented in C under Parix to run on the Parsytec GCel systems. R...
Using Helpful Sets to Improve Graph Bisections
- Univ. of Paderborn
, 1995
"... We describe a new, linear time heuristic for the improvement of graph bisections. The method is a variant of local search with sophisticated neighborhood relations. It is based on graph-theoretic observations that were used to find upper bounds for the bisection width of regular graphs. Efficiently ..."
Abstract
-
Cited by 33 (20 self)
- Add to MetaCart
We describe a new, linear time heuristic for the improvement of graph bisections. The method is a variant of local search with sophisticated neighborhood relations. It is based on graph-theoretic observations that were used to find upper bounds for the bisection width of regular graphs. Efficiently implemented, the new method can serve as an alternative to the commonly used local heuristics, not only in terms of the quality of attained solutions, but also in terms of space and time requirements. We compare our heuristic with a number of well known bisection algorithms. Extensive measurements show that the new method is a real improvement for graphs of certain types. Keywords: Graph Partitioning, Graph Bisection, Recursive Bisection, Edge Separators, Mapping, Local Search, Parallel Processing. This work was partly supported by the German Research Foundation (DFG Forschergruppe "Effiziente Nutzung massiv paralleler Systeme") and by the ESPRIT Basic Research Action No. 7141 (ALCOM II)....
Graph Partitioning Algorithms With Applications To Scientific Computing
- Parallel Numerical Algorithms
, 1997
"... Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of su ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of subsets such that few edges join two vertices in different subsets. Several new graph partitioning algorithms have been developed in the past few years, and we survey some of this activity. We describe the terminology associated with graph partitioning, the complexity of computing good separators, and graphs that have good separators. We then discuss early algorithms for graph partitioning, followed by three new algorithms based on geometric, algebraic, and multilevel ideas. The algebraic algorithm relies on an eigenvector of a Laplacian matrix associated with the graph to compute the partition. The algebraic algorithm is justified by formulating graph partitioning as a quadratic assignment p...
Mapping Algorithms and Software Environment for Data Parallel PDE . . .
- JOURNAL OF DISTRIBUTED AND PARALLEL COMPUTING
, 1994
"... We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization p ..."
Abstract
-
Cited by 31 (19 self)
- Add to MetaCart
We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization problem. Its optimal solution is essential for the efficient parallel processing of PDE computations. Determining data mappings that optimize a number of criteria, likeworkload balance, synchronization and local communication, often involves the solution of an NP-Complete problem. Although data mapping algorithms have been known for a few years there is lack of qualitative and quantitative comparisons based on the actual performance of the parallel computation. In this paper we present two new data mapping algorithms and evaluate them together with a large number of existing ones using the actual performance of data parallel iterative PDE solvers on the nCUBE II. Comparisons on the performance of data parallel iterative PDE solvers on medium and large scale problems demonstrate that some computationally inexpensive data block partitioning algorithms are as effective as the computationally expensive deterministic optimization algorithms. Also, these comparisons demonstrate that the existing approach in solving the data partitioning problem is inefficient for large scale problems. Finally, a software environment for the solution of the partitioning problem of data parallel iterative solvers is presented.
Dynamic Load Balancing in Computational Mechanics
- Computer Methods in Applied Mechanics and Engineering
"... . In many important computational mechanics applications, the computation adapts dynamically during the simulation. Examples include adaptive mesh refinement, particle simulations and transient dynamics calculations. When running these kinds of simulations on a parallel computer, the work must be a ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
. In many important computational mechanics applications, the computation adapts dynamically during the simulation. Examples include adaptive mesh refinement, particle simulations and transient dynamics calculations. When running these kinds of simulations on a parallel computer, the work must be assigned to processors in a dynamic fashion to keep the computational load balanced. A number of approaches have been proposed for this dynamic load balancing problem. This paper reviews the major classes of algorithms, and discusses their relative merits on problems from computational mechanics. Shortcomings in the state-of-the-art are identified and suggestions are made for future research directions. Key words. dynamic load balancing, parallel computer, adaptive mesh refinement 1. Introduction. The efficient use of a parallel computer requires two, often competing, objectives to be achieved. First, the processors must be kept busy doing useful work. And second, the amount of interprocess...
Parallel Algorithms for the Adaptive Refinement and Partitioning of Unstructured Meshes
- In Proceedings of the Scalable High-Performance Computing Conference
, 1997
"... The efficient solution of many large-scale scientific calculations depends on adaptive mesh strategies. In this paper we present new parallel algorithms to solve two significant problems that arise in this context: the generation of the adaptive mesh and the mesh partitioning. The crux of our refine ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
The efficient solution of many large-scale scientific calculations depends on adaptive mesh strategies. In this paper we present new parallel algorithms to solve two significant problems that arise in this context: the generation of the adaptive mesh and the mesh partitioning. The crux of our refinement algorithm is the identification of independent sets of elements that can be refined in parallel. The objective of our partitioning heuristic is to construct partitions with good aspect ratios. We present run-time bounds and computational results obtained on the Intel DELTA for these algorithms. These results demonstrate that the algorithms exhibit scalable performance and have run-times small in comparison with other aspects of the computation. 1 Introduction Adaptive mesh refinement techniques have been shown to be very successful in reducing the computation and storage requirements for determining approximate solutions to many partial differential equations (PDEs) [9]. Rather than us...
Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning
- IEEE Transactions on Computers
, 1998
"... New heuristic algorithms are proposed for the Graph Partitioning problem. A greedy construction scheme with an appropriate tie--breaking rule (MIN-MAX-GREEDY) produces initial assignments in a very fast time. For some classes of graphs, independent repetitions of MIN-MAX-GREEDY are sufficient to rep ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
New heuristic algorithms are proposed for the Graph Partitioning problem. A greedy construction scheme with an appropriate tie--breaking rule (MIN-MAX-GREEDY) produces initial assignments in a very fast time. For some classes of graphs, independent repetitions of MIN-MAX-GREEDY are sufficient to reproduce solutions found by more complex techniques. When the method is not competitive, the initial assignments are used as starting points for a prohibition-based scheme, where the prohibition is chosen in a randomized and reactive way, with a bias towards more successful choices in the previous part of the run. The relationship between prohibition-based diversification (Tabu Search) and the variable-depth Kernighan--Lin algorithm is discussed. Detailed experimental results are presented on benchmark suites used in the previous literature, consisting of graphs derived from parametric models (random graphs, geometric graphs, etc.) and of "realworld " graphs of large size. On the first series ...

