Results 1 - 10
of
29
Parallel Decomposition of Unstructured FEM-Meshes
- Concurrency: Practice & Experience
, 1995
"... . We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the pr ..."
Abstract
-
Cited by 42 (14 self)
- Add to MetaCart
(Show Context)
. We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains). The parallel parts of the method are implemented in C under Parix to run on the Parsytec GCel systems. R...
Multigrain parallel Delaunay mesh generation: Challenges and opportunities for multithreaded architectures
- In Proceedings of the 19th annual international conference on Supercomputing
, 2005
"... Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMTbased architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
(Show Context)
Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMTbased architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level. This multigrain data parallel approach targets clusters built from low-end, commercially available SMTs. Our experimental evaluation shows that current SMTs are not capable of executing fine-grain parallelism in PCDM. However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit fine-grain parallelism opportunities. The exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on a single physical processor. Our findings extend to other adaptive and irregular multigrain, parallel algorithms. 1
Multithreaded model for dynamic load balancing parallel adaptive PDE computations
, 1995
"... ..."
(Show Context)
Combining Helpful Sets and Parallel Simulated Annealing for the Graph-Partitioning Problem
- INT. J. PARALLEL ALGORITHMS AND APPLICATIONS
, 1996
"... In this paper we present a new algorithm for the k-partitioning problem which achieves an improved solution quality compared to known heuristics. We apply the principle of so called "helpful sets", which has shown to be very efficient for graph bisection, to the direct k-partitioning prob ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In this paper we present a new algorithm for the k-partitioning problem which achieves an improved solution quality compared to known heuristics. We apply the principle of so called "helpful sets", which has shown to be very efficient for graph bisection, to the direct k-partitioning problem. The principle is extended in several ways. We introduce a new abstraction technique which shrinks the graph during runtime in a dynamic way leading to shorter computation times and improved solutions qualities. The use of stochastic methods provides further improvements in terms of solution quality. Additionally we present a parallel implementation of the new heuristic. The parallel algorithm delivers the same solution quality as the sequential one while providing reasonable parallel efficiency on MIMD-systems of moderate size. All results are verified by experiments for various graphs and processor numbers.
Parallel Refinement of Unstructured Meshes
, 1999
"... In this paper we describe a parallel #-refinement algorithm for unstructured finite element meshes based on the longest-edge bisection of triangles and tetrahedrons. This algorithm is implemented in PARED, a system that supports the parallel adaptive solution of PDEs. We discuss the design of such a ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper we describe a parallel #-refinement algorithm for unstructured finite element meshes based on the longest-edge bisection of triangles and tetrahedrons. This algorithm is implemented in PARED, a system that supports the parallel adaptive solution of PDEs. We discuss the design of such an algorithm for distributed memory machines including the problem of propagating refinement across processor boundaries to obtain meshes that are conforming and non-degenerate. We also demonstrate that the meshes obtained by this algorithm are equivalent to the ones obtained using the serial longest-edge refinement method. We finally report on the performance of this refinement algorithm on a network of workstations. Keywords: mesh refinement, unstructured meshes, finite element methods, adaptation. 1. Introduction The finite element method (FEM) is a powerful and successful technique for the numerical solution of partial differential equations. When applied to problems that exhibit highl...
Dynamic Mesh Partitioning & Load-Balancing for Parallel Computational Mechanics Codes
- Parallel & Distributed Processing for Computational Mechanics. Saxe-Coburg Publications
, 1999
"... We discuss the load-balancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the load-balancing problem a ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We discuss the load-balancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the load-balancing problem and the optimisation problem, both locally and globally. We also briefly discuss whether repartitioning is always valid. Sample illustrative results are presented and we conclude that repartitioning is an attractive option if the load changes are not too dramatic and that there is a certain trade-off between partition quality and volume of data that the underlying application needs to migrate.
Decentralized Remapping of Data Parallel Applications in Distributed Memory Multiprocessors
- in Distributed Memory Multiprocessors. Concurrency: Practice and Experience
, 1997
"... In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimensionexchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system's workload. We implem ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimensionexchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system's workload. We implemented this remapping method in parallel WaTor simulations and parallel image thinning applications, and found it to be effective in reducing the computation time. The average performance gain is about 20% in the WaTor simulation of a 256 \Theta 256 ocean grid on 16 processors, and up to 8% in the thinning of a typical image of size 128 \Theta 128 on 8 processors. The performance gains due to remapping in the image thinning case are reasonably substantial given the fact that the application by its very nature does not necessarily favor remapping. We also implemented this remapping method, using up to 32 processors, for partitioning and re-partitioning of grids in computational fluid dynamics. It w...
An Element-Based Concurrent Partitioned for Unstructured Finite Element Meshes
"... Abstract. A concurrent partilionerforpartitioning Unstructured finite element meshes on distributed memory architectures is developed. The partitioned uses an element-based partitioning strategy 1[s main advantage over the more conventional node-based partitioning strategy is its modular programing ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. A concurrent partilionerforpartitioning Unstructured finite element meshes on distributed memory architectures is developed. The partitioned uses an element-based partitioning strategy 1[s main advantage over the more conventional node-based partitioning strategy is its modular programing approach to the development of parallel applications. The partitionerfirst partitions element centroids ioing a recursive inertial bisection algorithm. Elements and nodes then migrate according to the pw-titioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a non-scalable implementation which is a straightforward parallelization of a sequential partitioned. 1’Yw algorithms adopted in the partitioned scale logarithmically, as confirmed by actual timing measurements on the Intel Delta on up to 512 processorsfor scaled size problems.. 1.
Parallel mesh generation
- in Numerical Solution of Partial Differential Equations on Parallel Computers
, 2005
"... Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose the original mesh generation problem into smaller subproblems which are meshed in parallel. We organize the parallel mesh generation methods in terms of two basic attributes: (1) the sequential technique used for meshing the individual subproblems and (2) the degree of coupling between the subproblems. This survey shows that without compromising in the stability of parallel mesh generation methods it is possible to develop parallel meshing software using off-the-shelf sequential meshing codes. However, more research is required for the efficient use of the state-of-the-art codes which can scale from emerging chip multiprocessors (CMPs) to clusters built from CMPs. 2
Tree-Based Parallel Load-Balancing Methods for Solution-Adaptive Finite Element Graphs on Distributed memory Multicomputers
- IEEE Transactions on parallel and Distributed Systems
, 1999
"... Abstract—To solve the load imbalance problem of a solution-adaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—To solve the load imbalance problem of a solution-adaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load of each processor. For the former case, remapping can be performed by some fast mapping algorithms. For the latter case, a load-balancing algorithm can be applied to balance the computational load of each processor. In this paper, three tree-based parallel load-balancing methods, the MCSTLB method, the BTLB method, and the CBTLB method, were proposed to deal with the load imbalance problems of solution-adaptive finite element application programs. To evaluate the performance of the proposed methods, we have implemented those methods along with three mapping methods, the AE/ORB method, the AE/MC method, and the MLkP method, on an SP2 parallel machine. Three criteria, the execution time of mapping/load-balancing methods, the execution time of a solution-adaptive finite element application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for a solution-adaptive finite element application program, are used for the performance evaluation. The experimental results show that 1) if the initial mapping is performed by a mapping method and the same mapping method and load-balancing methods were used in each refinement to balance the load of processors, the execution time of an application program under a load-balancing method is always shorter than that of the mapping method, and 2) the execution time of an application program under the CBTLB method is shorter than that of the BTLB method and the MCSTLB method. Index Terms—Distributed memory multicomputers, partitioning, mapping, load balancing, solution-adaptive finite element graphs. ————————— — F ——————————