Results 1 - 10
of
18
Spectral Partitioning Works: Planar graphs and finite element meshes
- In IEEE Symposium on Foundations of Computer Science
, 1996
"... Spectral partitioning methods use the Fiedler vector---the eigenvector of the secondsmallest eigenvalue of the Laplacian matrix---to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extr ..."
Abstract
-
Cited by 124 (6 self)
- Add to MetaCart
Spectral partitioning methods use the Fiedler vector---the eigenvector of the secondsmallest eigenvalue of the Laplacian matrix---to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on bounded-degree planar graphs and finite element meshes--- the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( p n) for bounded-degree planar graphs and two-dimensional meshes and O i n 1=d j for well-shaped d-dimensional meshes. The heart of our analysis is an upper bound on the second-smallest eigenvalues of the Laplacian matrices of these graphs. 1. Introduction Spectral partitioning has become one of the mos...
A New Parallel Kernel-Independent Fast Multipole Method
- in SC2003
"... We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernel-independent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling techn ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernel-independent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPI-based parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixed-size scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS-1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.
Graph Partitioning and Continuous Quadratic Programming
, 1999
"... A continuous quadratic programming formulation is given for min-cut graph partitioning problems. In these problems, we partition the vertices of a graph into a collection of disjoint sets satisfying specified size constraints, while minimizing the sum of weights of edges connecting vertices in diffe ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
A continuous quadratic programming formulation is given for min-cut graph partitioning problems. In these problems, we partition the vertices of a graph into a collection of disjoint sets satisfying specified size constraints, while minimizing the sum of weights of edges connecting vertices in different sets. An optimal solution is related to an eigenvector (Fiedler vector) corresponding to the second smallest eigenvalue of the graph's Laplacian. Necessary and sufficient conditions characterizing local minima of the quadratic program are given. The effect of diagonal perturbations on the number of local minimizers is investigated using a test problem from the literature.
Fast Multipole Methods on Graphical Processors
- Journal of Computational Physics
"... The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at points distributed inside a computational domain at a large number of evaluation points to a specified accuracy ɛ. The method scales as O (N) compared to the direct method with complexity O(N 2), whic ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at points distributed inside a computational domain at a large number of evaluation points to a specified accuracy ɛ. The method scales as O (N) compared to the direct method with complexity O(N 2), which allows one to solve larger scale problems. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on dataparallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architectures, and determined optimal settings for the FMM on the GPU, which are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range 30-60 compared to a serial CPU implementation for benchmark cases of up to million size. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at 24-43 Teraflop rate if we use straightforward summation. 1
Dynamic compressed hyperoctrees with application to the N-body problem
- In Proc. 19th Conf
, 1999
"... Abstract. Hyperoctree is a popular data structure for organizing multidimensional point data. The main drawback of this data structure is that its size and the run-time of operations supported by it are dependent upon the distribution of the points. Clarkson rectified the distributiondependency in t ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Hyperoctree is a popular data structure for organizing multidimensional point data. The main drawback of this data structure is that its size and the run-time of operations supported by it are dependent upon the distribution of the points. Clarkson rectified the distributiondependency in the size of hyperoctrees by introducing compressed hyperoctrees. He presents an O(n log n) expected time randomized algorithm to construct a compressed hyperoctree. In this paper, we give three deterministic algorithms to construct a compressed hyperoctree in O(n log n) time, for any fixed dimension d. We present O(log n) algorithms for point and cubic region searches, point insertions and deletions. We propose a solution to the N-body problem in O(n) time, given the tree. Our algorithms also reduce the run-time dependency on the number of dimensions. 1
Min-Max-Boundary Domain Decomposition
- Theor. Comput. Sci
, 1998
"... Domain decomposition is one of the most effective and popular parallel computing techniques for solving large scale numerical systems. In the special case when the amount of computation in a subdomain is proportional to the volume of the subdomain, domain decomposition amounts to minimizing the surf ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Domain decomposition is one of the most effective and popular parallel computing techniques for solving large scale numerical systems. In the special case when the amount of computation in a subdomain is proportional to the volume of the subdomain, domain decomposition amounts to minimizing the surface area of each subdomain while dividing the volume evenly. Motivated by this fact, we study the following min--max boundary multi--way partitioning problem: Given a graph G and an integer k ? 1, we would like to divide G into k subgraphs G 1 ; : : : ; G k (by removing edges) such that (i) jG i j = \Theta(jGj=k) for all i 2 f1; : : : ; kg; and (ii) the maximum boundary size of any subgraph (the set of edges connecting it with other subgraphs) is minimized. We provide an algorithm that given G, a well--shaped mesh in d dimensions, finds a partition of G into k subgraphs G 1 ; : : : ; G k , such that for all i, G i has \Theta(jGj=k) vertices and the number of edges connecting G i with the ot...
On the Quality of Partitions based on Space-Filling Curves
, 2002
"... This paper presents bounds on the quality of partitions induced by space-filling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents bounds on the quality of partitions induced by space-filling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times worse with respect to the length of the surface. The Lebesgue indexing gives better results than the Hilbert indexing in worst case analysis. Furthermore, the surface of partitions based on the Lebesgue indexing are at most 3 times larger than the optimal in average case.
Average Case Quality of Partitions Induced by the Lebesgue Indexing
, 2001
"... This paper presents the quality of partitions induced by the Lebesgue curve in average case. The surface that surrounds an arbitrary index range is compared with the optimal partition in the grid, i. e. the square. The upper bound on the surface is asymptotically 3 times the optimal size. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents the quality of partitions induced by the Lebesgue curve in average case. The surface that surrounds an arbitrary index range is compared with the optimal partition in the grid, i. e. the square. The upper bound on the surface is asymptotically 3 times the optimal size.
Direct N-body Kernels for Multicore Platforms
"... Abstract—We present an inter-architectural comparison of single- and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU sy ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—We present an inter-architectural comparison of single- and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU systems. We compare our implementations across platforms on a variety of proxy measures, including performance, coding complexity, and energy efficiency. I.
Multiset Graph Partitioning
- Math.MethodsOper.Res
, 2001
"... . Optimality conditions are given for a quadratic programming formulation of the multiset graph partitioning problem. These conditions are related to the structure of the graph and properties of the weights. Key words. graph partitioning, min-cut, max-cut, quadratic programming, optimality conditio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. Optimality conditions are given for a quadratic programming formulation of the multiset graph partitioning problem. These conditions are related to the structure of the graph and properties of the weights. Key words. graph partitioning, min-cut, max-cut, quadratic programming, optimality conditions AMS(MOS) subject classications. 90C35, 90C27, 90C20 This work was supported by the National Science Foundation. 1 1.

