Results 1  10
of
29
Parallel Decomposition of Unstructured FEMMeshes
 Concurrency: Practice & Experience
, 1995
"... . We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEMmeshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the pr ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
(Show Context)
. We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEMmeshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains). The parallel parts of the method are implemented in C under Parix to run on the Parsytec GCel systems. R...
Multigrain parallel Delaunay mesh generation: Challenges and opportunities for multithreaded architectures
 In Proceedings of the 19th annual international conference on Supercomputing
, 2005
"... Given the importance of parallel mesh generation in largescale scientific applications and the proliferation of multilevel SMTbased architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
(Show Context)
Given the importance of parallel mesh generation in largescale scientific applications and the proliferation of multilevel SMTbased architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarsegrain parallelism at the subdomain level and finegrain at the element level. This multigrain data parallel approach targets clusters built from lowend, commercially available SMTs. Our experimental evaluation shows that current SMTs are not capable of executing finegrain parallelism in PCDM. However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit finegrain parallelism opportunities. The exploitation of finegrain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the stateoftheart sequential mesher on a single physical processor. Our findings extend to other adaptive and irregular multigrain, parallel algorithms. 1
Multithreaded model for dynamic load balancing parallel adaptive PDE computations
, 1995
"... ..."
(Show Context)
Combining Helpful Sets and Parallel Simulated Annealing for the GraphPartitioning Problem
 INT. J. PARALLEL ALGORITHMS AND APPLICATIONS
, 1996
"... In this paper we present a new algorithm for the kpartitioning problem which achieves an improved solution quality compared to known heuristics. We apply the principle of so called "helpful sets", which has shown to be very efficient for graph bisection, to the direct kpartitioning prob ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
In this paper we present a new algorithm for the kpartitioning problem which achieves an improved solution quality compared to known heuristics. We apply the principle of so called "helpful sets", which has shown to be very efficient for graph bisection, to the direct kpartitioning problem. The principle is extended in several ways. We introduce a new abstraction technique which shrinks the graph during runtime in a dynamic way leading to shorter computation times and improved solutions qualities. The use of stochastic methods provides further improvements in terms of solution quality. Additionally we present a parallel implementation of the new heuristic. The parallel algorithm delivers the same solution quality as the sequential one while providing reasonable parallel efficiency on MIMDsystems of moderate size. All results are verified by experiments for various graphs and processor numbers.
Parallel Refinement of Unstructured Meshes
, 1999
"... In this paper we describe a parallel #refinement algorithm for unstructured finite element meshes based on the longestedge bisection of triangles and tetrahedrons. This algorithm is implemented in PARED, a system that supports the parallel adaptive solution of PDEs. We discuss the design of such a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In this paper we describe a parallel #refinement algorithm for unstructured finite element meshes based on the longestedge bisection of triangles and tetrahedrons. This algorithm is implemented in PARED, a system that supports the parallel adaptive solution of PDEs. We discuss the design of such an algorithm for distributed memory machines including the problem of propagating refinement across processor boundaries to obtain meshes that are conforming and nondegenerate. We also demonstrate that the meshes obtained by this algorithm are equivalent to the ones obtained using the serial longestedge refinement method. We finally report on the performance of this refinement algorithm on a network of workstations. Keywords: mesh refinement, unstructured meshes, finite element methods, adaptation. 1. Introduction The finite element method (FEM) is a powerful and successful technique for the numerical solution of partial differential equations. When applied to problems that exhibit highl...
Dynamic Mesh Partitioning & LoadBalancing for Parallel Computational Mechanics Codes
 Parallel & Distributed Processing for Computational Mechanics. SaxeCoburg Publications
, 1999
"... We discuss the loadbalancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the loadbalancing problem a ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We discuss the loadbalancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the loadbalancing problem and the optimisation problem, both locally and globally. We also briefly discuss whether repartitioning is always valid. Sample illustrative results are presented and we conclude that repartitioning is an attractive option if the load changes are not too dramatic and that there is a certain tradeoff between partition quality and volume of data that the underlying application needs to migrate.
Decentralized Remapping of Data Parallel Applications in Distributed Memory Multiprocessors
 in Distributed Memory Multiprocessors. Concurrency: Practice and Experience
, 1997
"... In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimensionexchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system's workload. We implem ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimensionexchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system's workload. We implemented this remapping method in parallel WaTor simulations and parallel image thinning applications, and found it to be effective in reducing the computation time. The average performance gain is about 20% in the WaTor simulation of a 256 \Theta 256 ocean grid on 16 processors, and up to 8% in the thinning of a typical image of size 128 \Theta 128 on 8 processors. The performance gains due to remapping in the image thinning case are reasonably substantial given the fact that the application by its very nature does not necessarily favor remapping. We also implemented this remapping method, using up to 32 processors, for partitioning and repartitioning of grids in computational fluid dynamics. It w...
An ElementBased Concurrent Partitioned for Unstructured Finite Element Meshes
"... Abstract. A concurrent partilionerforpartitioning Unstructured finite element meshes on distributed memory architectures is developed. The partitioned uses an elementbased partitioning strategy 1[s main advantage over the more conventional nodebased partitioning strategy is its modular programing ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. A concurrent partilionerforpartitioning Unstructured finite element meshes on distributed memory architectures is developed. The partitioned uses an elementbased partitioning strategy 1[s main advantage over the more conventional nodebased partitioning strategy is its modular programing approach to the development of parallel applications. The partitionerfirst partitions element centroids ioing a recursive inertial bisection algorithm. Elements and nodes then migrate according to the pwtitioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a nonscalable implementation which is a straightforward parallelization of a sequential partitioned. 1’Yw algorithms adopted in the partitioned scale logarithmically, as confirmed by actual timing measurements on the Intel Delta on up to 512 processorsfor scaled size problems.. 1.
Parallel mesh generation
 in Numerical Solution of Partial Differential Equations on Parallel Computers
, 2005
"... Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose t ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose the original mesh generation problem into smaller subproblems which are meshed in parallel. We organize the parallel mesh generation methods in terms of two basic attributes: (1) the sequential technique used for meshing the individual subproblems and (2) the degree of coupling between the subproblems. This survey shows that without compromising in the stability of parallel mesh generation methods it is possible to develop parallel meshing software using offtheshelf sequential meshing codes. However, more research is required for the efficient use of the stateoftheart codes which can scale from emerging chip multiprocessors (CMPs) to clusters built from CMPs. 2
TreeBased Parallel LoadBalancing Methods for SolutionAdaptive Finite Element Graphs on Distributed memory Multicomputers
 IEEE Transactions on parallel and Distributed Systems
, 1999
"... Abstract—To solve the load imbalance problem of a solutionadaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract—To solve the load imbalance problem of a solutionadaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load of each processor. For the former case, remapping can be performed by some fast mapping algorithms. For the latter case, a loadbalancing algorithm can be applied to balance the computational load of each processor. In this paper, three treebased parallel loadbalancing methods, the MCSTLB method, the BTLB method, and the CBTLB method, were proposed to deal with the load imbalance problems of solutionadaptive finite element application programs. To evaluate the performance of the proposed methods, we have implemented those methods along with three mapping methods, the AE/ORB method, the AE/MC method, and the MLkP method, on an SP2 parallel machine. Three criteria, the execution time of mapping/loadbalancing methods, the execution time of a solutionadaptive finite element application program under different mapping/loadbalancing methods, and the speedups achieved by mapping/loadbalancing methods for a solutionadaptive finite element application program, are used for the performance evaluation. The experimental results show that 1) if the initial mapping is performed by a mapping method and the same mapping method and loadbalancing methods were used in each refinement to balance the load of processors, the execution time of an application program under a loadbalancing method is always shorter than that of the mapping method, and 2) the execution time of an application program under the CBTLB method is shorter than that of the BTLB method and the MCSTLB method. Index Terms—Distributed memory multicomputers, partitioning, mapping, load balancing, solutionadaptive finite element graphs. ————————— — F ——————————