Results 1  10
of
65
PLTMG: A Software Package for Solving Elliptic Partial Differential Equations. Users' Guide 9.0
, 2004
"... ..."
Parallel Dynamic Graph Partitioning for Adaptive Unstructured Meshes
, 1997
"... this paper we describe such a parallel optimization technique ..."
Abstract

Cited by 96 (18 self)
 Add to MetaCart
this paper we describe such a parallel optimization technique
A Practical Approach to Dynamic Load Balancing
 IEEE Transactions on Parallel and Distributed Systems
, 1998
"... Abstract—This paper presents a cohesive, practical load balancing framework that improves upon existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines, such as the Cray T3D/E and Intel Paragon, shared memory systems, such ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a cohesive, practical load balancing framework that improves upon existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines, such as the Cray T3D/E and Intel Paragon, shared memory systems, such as the Silicon Graphics PowerChallenge, and networks of workstations. As part of the work, an adaptive heat diffusion scheme is presented, as well as a task selection mechanism that can preserve or improve communication locality. Unlike many previous efforts in this arena, the techniques have been applied to two largescale industrial applications on a variety of multicomputers. In the process, this work exposes a serious deficiency in current load balancing strategies, motivating further work in this area. Index Terms—Dynamic load balancing, diffusion, massively parallel computing, irregular problems. ————————— — F —————————— 1
Parallel Multigrid in an Adaptive PDE Solver Based on Hashing and SpaceFilling Curves
, 1997
"... this paper is organized as follows: In section 2 we discuss data structures for adaptive PDE solvers. Here, we suggest to use hash tables instead of the usually employed tree type data structures. Then, in section 3 we discuss the main features of the sequential adaptive multilevel solver. Section 4 ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
this paper is organized as follows: In section 2 we discuss data structures for adaptive PDE solvers. Here, we suggest to use hash tables instead of the usually employed tree type data structures. Then, in section 3 we discuss the main features of the sequential adaptive multilevel solver. Section 4 deals with the partitioning and distribution of adaptive grids with spacefilling curves and section 5 gives the main features of our new parallelized adaptive multilevel solver. In section 6 we present the results of numerical experiments on a parallel cluster computer with up to 64 processors. It is shown that our approach works nicely also for problems with severe singularities which result in locally refined meshes. Here, the work overhead for load balancing and data distribution remains only a small fraction of the overall work load. 2. DATA STRUCTURES FOR ADAPTIVE PDE SOLVERS 2.1. Adaptive Cycle
Graph Partitioning Algorithms With Applications To Scientific Computing
 Parallel Numerical Algorithms
, 1997
"... Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of su ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of subsets such that few edges join two vertices in different subsets. Several new graph partitioning algorithms have been developed in the past few years, and we survey some of this activity. We describe the terminology associated with graph partitioning, the complexity of computing good separators, and graphs that have good separators. We then discuss early algorithms for graph partitioning, followed by three new algorithms based on geometric, algebraic, and multilevel ideas. The algebraic algorithm relies on an eigenvector of a Laplacian matrix associated with the graph to compute the partition. The algebraic algorithm is justified by formulating graph partitioning as a quadratic assignment p...
A New Paradigm for Parallel Adaptive Meshing Algorithms
 SIAM J. Sci. Comput
, 2003
"... We present a new approach to the use of parallel computers with adaptive finite element methods. This approach addresses the load balancing problem in a new way, requiring far less communication than current approaches. It also allows existing sequential adaptive PDE codes such as PLTMG and MC to ru ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
(Show Context)
We present a new approach to the use of parallel computers with adaptive finite element methods. This approach addresses the load balancing problem in a new way, requiring far less communication than current approaches. It also allows existing sequential adaptive PDE codes such as PLTMG and MC to run in a parallel environment without a large investment in recoding. In this new approach, the load balancing problem is reduced to the numerical solution of a small elliptic problem on a single processor, using a sequential adaptive solver, without requiring any modifications to the sequential solver. The small elliptic problem is used to produce a posteriori error estimates to predict future element densities in the mesh, which are then used in a weighted recursive spectral bisection of the initial mesh. The bulk of the calculation then takes place independently on each processor, with no communication, using possibly the same sequential adaptive solver. Each processor adapts its region of the mesh independently, and a nearly loadbalanced mesh distribution is usually obtained as a result of the initial weighted spectral bisection. Only the initial fanout of the mesh decomposition to the processors requires communication. Two additional steps requiring boundary exchange communication may be employed after the individual processors reach an adapted solution, namely, the construction of a global conforming mesh from the independent subproblems, followed by a final smoothing phase using the subdomain solutions as an initial guess. We present a series of convincing numerical experiments that illustrate the e#ectiveness of this approach. The justification of the initial refinement prediction step, as well as the justification of skipping the two communicationintensive steps, ...
Parallel Decomposition of Unstructured FEMMeshes
 Concurrency: Practice & Experience
, 1995
"... . We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEMmeshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the pr ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
(Show Context)
. We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEMmeshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains). The parallel parts of the method are implemented in C under Parix to run on the Parsytec GCel systems. R...
Parallel Structures and Dynamic Load Balancing for Adaptive Finite Element Computation
 Applied Numerical Mathematics
, 1996
"... this paper, we have focused on describing and comparing several load balancing schemes. Comparisons by timing are difficult, since times vary between runs having the same parameters. The highspeed switch of the IBM SP2 computer is a shared resource that affects run times. More subtle effects can re ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
(Show Context)
this paper, we have focused on describing and comparing several load balancing schemes. Comparisons by timing are difficult, since times vary between runs having the same parameters. The highspeed switch of the IBM SP2 computer is a shared resource that affects run times. More subtle effects can result from differences in the order in which messages used for migration are processed. Changes in the order in which those messages are received and integrated into the local MDB result in different traversal orders of the mesh entities. These differences cause small changes in load balancings and coarsenings. While such differences in meshes and partitionings do not affect the solution accuracy, they can cause sufficient changes in efficiency to make precise timings difficult. Qualitatively, PSIRB produced the best partitions (measured as a function of total analysis time). Octreegenerated partitions were comparable but resulted in slightly longer solution times. In both cases, one or two iterations of partition boundary smoothing led to a quality improvement. ITB by itself resulted in poorer partition quality, but is useful when mesh changes are small between computational stages. Predictive enrichment provided su21 perior performance to our current enrichment process with transient problems where there are frequent enrichment and balancing steps. Enhancements to the existing load balancing procedures and the implementation of new ones are under investigation. Improvements in the slicebyslice technique used by ITB for migration are necessary. Experiments with geometrical methods that use the spatial location of elements relative to the centroids of sending and receiving processors showed promise at reducing the number of processor interconnections. Vidwans et al. [39] pr...
Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning
 IEEE Transactions on Computers
, 1998
"... New heuristic algorithms are proposed for the Graph Partitioning problem. A greedy construction scheme with an appropriate tiebreaking rule (MINMAXGREEDY) produces initial assignments in a very fast time. For some classes of graphs, independent repetitions of MINMAXGREEDY are sufficient to rep ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
New heuristic algorithms are proposed for the Graph Partitioning problem. A greedy construction scheme with an appropriate tiebreaking rule (MINMAXGREEDY) produces initial assignments in a very fast time. For some classes of graphs, independent repetitions of MINMAXGREEDY are sufficient to reproduce solutions found by more complex techniques. When the method is not competitive, the initial assignments are used as starting points for a prohibitionbased scheme, where the prohibition is chosen in a randomized and reactive way, with a bias towards more successful choices in the previous part of the run. The relationship between prohibitionbased diversification (Tabu Search) and the variabledepth KernighanLin algorithm is discussed. Detailed experimental results are presented on benchmark suites used in the previous literature, consisting of graphs derived from parametric models (random graphs, geometric graphs, etc.) and of "realworld " graphs of large size. On the first series ...
Load Balancing Strategies For Distributed Memory Machines
 MultiScale Phenomena and Their Simulation
, 1997
"... Load balancing in large parallel systems with distributed memory is a difficult task often influencing the overall efficiency of applications substantially. A number of efficient distributed load balancing strategies have been developed in the recent years. Although they are currently not generally ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
Load balancing in large parallel systems with distributed memory is a difficult task often influencing the overall efficiency of applications substantially. A number of efficient distributed load balancing strategies have been developed in the recent years. Although they are currently not generally available as part of parallel operating systems, it is often not difficult to integrate them into applications. This paper gives a classification of different load balancing problems based on application characteristics. For the case of applications out of the field of scientific computing, useful methods are described in more detail.