Results 1  10
of
29
Spectral Partitioning Works: Planar graphs and finite element meshes
 In IEEE Symposium on Foundations of Computer Science
, 1996
"... Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extr ..."
Abstract

Cited by 153 (8 self)
 Add to MetaCart
(Show Context)
Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on boundeddegree planar graphs and finite element meshes the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( p n) for boundeddegree planar graphs and twodimensional meshes and O i n 1=d j for wellshaped ddimensional meshes. The heart of our analysis is an upper bound on the secondsmallest eigenvalues of the Laplacian matrices of these graphs. 1. Introduction Spectral partitioning has become one of the mos...
The fast multipole method: numerical implementation
 J. Comput. Phys
, 2000
"... We study integral methods applied to the resolution of the Maxwell equations where the linear system is solved using an iterative method which requires only matrix–vector products. The fast multipole method (FMM) is one of the most efficient methods used to perform matrix–vector products and accele ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
We study integral methods applied to the resolution of the Maxwell equations where the linear system is solved using an iterative method which requires only matrix–vector products. The fast multipole method (FMM) is one of the most efficient methods used to perform matrix–vector products and accelerate the resolution of the linear system. A problem involving N degrees of freedom may be solved in CN iter N log N floating operations, where C is a constant depending on the implementation of the method. In this article several techniques allowing one to reduce the constant C are analyzed. This reduction implies a lower total CPU time and a larger range of application of the FMM. In particular, new interpolation and anterpolation schemes are proposed which greatly improve on previous algorithms. Several numerical tests are also described. These confirm the efficiency and the theoretical
A New Parallel KernelIndependent Fast Multipole Method
 in SC2003
"... We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling techn ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
(Show Context)
We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernelindependent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPIbased parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixedsize scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.
Fast Multipole Methods on Graphical Processors
 Journal of Computational Physics
"... The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at points distributed inside a computational domain at a large number of evaluation points to a specified accuracy ɛ. The method scales as O (N) compared to the direct method with complexity O(N 2), whic ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at points distributed inside a computational domain at a large number of evaluation points to a specified accuracy ɛ. The method scales as O (N) compared to the direct method with complexity O(N 2), which allows one to solve larger scale problems. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on dataparallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architectures, and determined optimal settings for the FMM on the GPU, which are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range 3060 compared to a serial CPU implementation for benchmark cases of up to million size. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at 2443 Teraflop rate if we use straightforward summation. 1
Graph partitioning and continuous quadratic programming
 SIAM J. Discrete Math
, 1999
"... Abstract. A continuous quadratic programming formulation is given for mincut graph partitioning problems. In these problems, we partition the vertices of a graph into a collection of disjoint sets satisfying specified size constraints, while minimizing the sum of weights of edges connecting vertice ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract. A continuous quadratic programming formulation is given for mincut graph partitioning problems. In these problems, we partition the vertices of a graph into a collection of disjoint sets satisfying specified size constraints, while minimizing the sum of weights of edges connecting vertices in different sets. An optimal solution is related to an eigenvector (Fiedler vector) corresponding to the second smallest eigenvalue of the graph’s Laplacian. Necessary and sufficient conditions characterizing local minima of the quadratic program are given. The effect of diagonal perturbations on the number of local minimizers is investigated using a test problem from the literature.
On the Quality of Partitions based on SpaceFilling Curves
, 2002
"... This paper presents bounds on the quality of partitions induced by spacefilling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
This paper presents bounds on the quality of partitions induced by spacefilling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times worse with respect to the length of the surface. The Lebesgue indexing gives better results than the Hilbert indexing in worst case analysis. Furthermore, the surface of partitions based on the Lebesgue indexing are at most 3 times larger than the optimal in average case.
Dynamic compressed hyperoctrees with application to the Nbody problem
 In Proc. 19th Conf
, 1999
"... Abstract. Hyperoctree is a popular data structure for organizing multidimensional point data. The main drawback of this data structure is that its size and the runtime of operations supported by it are dependent upon the distribution of the points. Clarkson rectified the distributiondependency in t ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract. Hyperoctree is a popular data structure for organizing multidimensional point data. The main drawback of this data structure is that its size and the runtime of operations supported by it are dependent upon the distribution of the points. Clarkson rectified the distributiondependency in the size of hyperoctrees by introducing compressed hyperoctrees. He presents an O(n log n) expected time randomized algorithm to construct a compressed hyperoctree. In this paper, we give three deterministic algorithms to construct a compressed hyperoctree in O(n log n) time, for any fixed dimension d. We present O(log n) algorithms for point and cubic region searches, point insertions and deletions. We propose a solution to the Nbody problem in O(n) time, given the tree. Our algorithms also reduce the runtime dependency on the number of dimensions. 1
Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures
"... We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPUGPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divideandconquer algorithm that per ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPUGPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divideandconquer algorithm that performs a fast Nbody sum using a spatial decomposition and is often used in a timestepping or iterative loop. Using the observation that the local summation and the analysisbased translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiplenode versions. Our implementation can perform the Nbody sum for 128M particles on 16 nodes in 4.23 seconds, a performance not achieved by others in the literature on such clusters. ACM computing classification: C.1.2 [Multiple Data Stream Architectures]:Parallel processors; C.1.m [Miscellaneous]:
MinMaxBoundary Domain Decomposition
 Theor. Comput. Sci
, 1998
"... Domain decomposition is one of the most effective and popular parallel computing techniques for solving large scale numerical systems. In the special case when the amount of computation in a subdomain is proportional to the volume of the subdomain, domain decomposition amounts to minimizing the surf ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Domain decomposition is one of the most effective and popular parallel computing techniques for solving large scale numerical systems. In the special case when the amount of computation in a subdomain is proportional to the volume of the subdomain, domain decomposition amounts to minimizing the surface area of each subdomain while dividing the volume evenly. Motivated by this fact, we study the following minmax boundary multiway partitioning problem: Given a graph G and an integer k ? 1, we would like to divide G into k subgraphs G 1 ; : : : ; G k (by removing edges) such that (i) jG i j = \Theta(jGj=k) for all i 2 f1; : : : ; kg; and (ii) the maximum boundary size of any subgraph (the set of edges connecting it with other subgraphs) is minimized. We provide an algorithm that given G, a wellshaped mesh in d dimensions, finds a partition of G into k subgraphs G 1 ; : : : ; G k , such that for all i, G i has \Theta(jGj=k) vertices and the number of edges connecting G i with the ot...