Results 1 - 10
of
14
A high-order 3D boundary integral equation solver for elliptic pdes in smooth domains
- Journal of Computational Physics
, 2005
"... We present a high-order boundary integral equation solver for 3D elliptic boundary value problems on domains with smooth boundaries. We use Nyström’s method for discretization and we combine it with special quadrature rules for the singular kernels that appear in the boundary integrals. The overall ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We present a high-order boundary integral equation solver for 3D elliptic boundary value problems on domains with smooth boundaries. We use Nyström’s method for discretization and we combine it with special quadrature rules for the singular kernels that appear in the boundary integrals. The overall asymptotic complexity of our method is O(N 3/2), where N is the number of discretization points on the boundary of the domain, and corresponds to linear complexity in the number of uniformly sampled evaluation points. A kernel-independent fast summation algorithm is used to accelerate the evaluation of the discretized integral operators. We describe a high-order accurate method for evaluating the solution at arbitrary points inside the domain, including points close to the domain boundary. We demonstrate how our solver, combined with a regular-grid spectral solver, can be applied to problems with distributed sources. We present numerical results for the Stokes, Navier, and Poisson problems.
Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures
"... This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximati ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve doubleprecision performance by 25 × on Intel’s quad-core Nehalem, 9.4 × on AMD’s quad-core Barcelona, and 37.6 × on Sun’s Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA’s most advanced GPU architecture. 1
Fast multipole method for the biharmonic equation in three dimensions
- J. Comput. Phys
, 2006
"... The evaluation of sums (matrix-vector products) of the solutions of the three-dimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown tha ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The evaluation of sums (matrix-vector products) of the solutions of the three-dimensional biharmonic equation can be accelerated using the fast multipole method, while memory requirements can also be significantly reduced. We develop a complete translation theory for these equations. It is shown that translations of elementary solutions of the biharmonic equation can be achieved by considering the translation of a pair of elementary solutions of the Laplace equations. The extension of the theory to the case of polyharmonic equations in R 3 is also discussed. An efficient way of performing the FMM for biharmonic equations using the solution of a complex valued FMM for the Laplace equation is presented. Compared to previous methods presented for the biharmonic equation our method appears more efficient. The theory is implemented and numerical tests presented that demonstrate the performance of the method for varying problem sizes and accuracy requirements. In our implementation, the FMM for the biharmonic equation is faster than direct matrix vector product for a matrix size of 550 for a relative L2 accuracy 2 =10 −4, and N = 3550 for 2 =10 −12. 1
Hybrid MPI-thread parallelization of the Fast Multipole Method
- in "6th International Symposium on Parallel and Distributed Computing (ISPDC
"... We present in this paper multi-thread and multi-process parallelizations of the Fast Multipole Method (FMM) for Laplace equation, for uniform and non uniform distributions. These parallelizations apply to the original FMM formulation and to our new matrix formulation with BLAS (Basic Linear Algebra ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present in this paper multi-thread and multi-process parallelizations of the Fast Multipole Method (FMM) for Laplace equation, for uniform and non uniform distributions. These parallelizations apply to the original FMM formulation and to our new matrix formulation with BLAS (Basic Linear Algebra Subprograms) routines. Differences between the multi-thread and the multi-process versions are detailed, and a hybrid MPI-thread approach enables to gain parallel efficiency and memory scalability over the pure MPI one on clusters of SMP nodes. On 128 processors, we obtain 85 % (respectively 75%) parallel efficiency for uniform (respectively non uniform) distributions with up to 100 million particles. 1.
On the Limits of GPU Acceleration
"... This paper throws a small “wet blanket ” on the hot topic of GPGPU acceleration, based on experience analyzing and tuning both multithreaded CPU and GPU implementations of three computations in scientific computing. These computations—(a) iterative sparse linear solvers; (b) sparse Cholesky factoriz ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper throws a small “wet blanket ” on the hot topic of GPGPU acceleration, based on experience analyzing and tuning both multithreaded CPU and GPU implementations of three computations in scientific computing. These computations—(a) iterative sparse linear solvers; (b) sparse Cholesky factorization; and (c) the fast multipole method—exhibit complex behavior and vary in computational intensity and memory reference irregularity. In each case, algorithmic analysis and prior work might lead us to conclude that an idealized GPU can deliver better performance, but we find that for at least equal-effort CPU tuning and consideration of realistic workloads and calling-contexts, we can with two modern quad-core CPU sockets roughly match one or two
Petascale direct numerical simulation of blood flow on 200K
"... cores and heterogeneous architectures ..."
Scalable Parallel Octree Meshing For Terascale Applications
- in SC2005
, 2005
"... We present a new methodology for generating and adapting octree meshes for terascale applications. Our approach combines existing methods, such as parallel octree decomposition and space-filling curves, with a set of new methods that address the special needs of parallel octree meshing. We have impl ..."
Abstract
- Add to MetaCart
We present a new methodology for generating and adapting octree meshes for terascale applications. Our approach combines existing methods, such as parallel octree decomposition and space-filling curves, with a set of new methods that address the special needs of parallel octree meshing. We have implemented these techniques in a parallel meshing tool called Octor. Performance evaluations on up to 2000 processors show that Octor has good isogranular scalability, fixed-size scalability, and absolute running time. Octor also provides a novel data access interface to parallel PDE solvers and parallel visualization pipelines, making it possible to develop tightly coupled end-to-end finite element simulations on terascale systems.
SIAM J. SCI. COMPUT. Vol. 30, No. 5, pp. 2675–2708 c ○ 2008 Society for Industrial and Applied Mathematics BOTTOM-UP CONSTRUCTION AND 2:1 BALANCE REFINEMENT OF LINEAR OCTREES IN PARALLEL ∗
"... Abstract. In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstruct ..."
Abstract
- Add to MetaCart
Abstract. In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Center’s TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have O(N log N) work and O(N) storage complexity. Under reasonable assumptions on the distribution of octants and the work per octant, the parallel time complexity is O ( N np number of processors. log( N np)+np log np), where N is the size of the final linear octree and np is the
A Free-Space Adaptive FMM-Based PDE Solver in Three Dimensions
, 2008
"... We present a kernel-independent, adaptive fast multipole method (FMM) of arbitrary order accuracy for solving elliptic PDEs in three dimensions with radiation boundary conditions. The algorithm requires only a Green’s function evaluation routine for the governing equation and a representation of the ..."
Abstract
- Add to MetaCart
We present a kernel-independent, adaptive fast multipole method (FMM) of arbitrary order accuracy for solving elliptic PDEs in three dimensions with radiation boundary conditions. The algorithm requires only a Green’s function evaluation routine for the governing equation and a representation of the source distribution (the right-hand side) that can be evaluated at arbitrary points. The performance of the FMM is accelerated in two ways. First, we construct a piecewise polynomial approximation of the right-hand side and compute far-field expansions in the FMM from the coefficients of this approximation. Second, we precompute tables of quadratures to handle the near-field interactions on adaptive octree data structures, keeping the total storage requirements in check through the exploitation of symmetries. We present numerical examples for the Laplace, modified Helmholtz and Stokes equations. 1

