## A New Parallel Kernel-Independent Fast Multipole Method (0)

### Cached

### Download Links

- [cat.nyu.edu]
- [www.mrl.nyu.edu]
- [www.mrl.nyu.edu]
- [www.seas.upenn.edu]
- [www.sc-conference.org]
- [www.cc.gatech.edu]
- [mrl.nyu.edu]
- [www.mrl.nyu.edu]
- [www.mrl.nyu.edu]
- [mrl.nyu.edu]
- [www.mrl.nyu.edu]
- [mrl.nyu.edu]
- [www.harperlangston.com]
- [cat.nyu.edu]
- [www.supercomp.org]
- DBLP

### Other Repositories/Bibliography

Venue: | in SC2003 |

Citations: | 19 - 9 self |

### BibTeX

@INPROCEEDINGS{Ying_anew,

author = {Lexing Ying and George Biros and Denis Zorin and Harper Langston},

title = {A New Parallel Kernel-Independent Fast Multipole Method},

booktitle = {in SC2003},

year = {},

pages = {14},

publisher = {IEEE Computer Society}

}

### OpenURL

### Abstract

We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernel-independent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPI-based parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixed-size scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS-1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.

### Citations

338 |
The Rapid Evaluation of Potential Fields in Particle Systems
- Greengard
- 1988
(Show Context)
Citation Context ...wns. In an adaptive FMM algorithm, in order to calculate the interaction at a box B, we need the information from the boxes in which contains B itself and the leaf boxes the following four lists ([4],=-=[7]-=-): (1) U list LB U which are adjacent to B if B is leaf, and it is empty when B is non-leaf; (2) V list LB V which contains the children of the neighbors of B’s parent, which are not adjacent to B; (3... |

146 | A parallel hashed oct-tree n-body algorithm
- Warren, Salmon
- 1993
(Show Context)
Citation Context ...were the local essential trees (LETs), which provide a framework for parallelization of Barnes-Hut algorithm and can be extended to the FMM. The hashed octree data structures were first introduced in =-=[24]-=- along with space-filling curves used for partitioning and load balancing, and further increased efficiency and scalability of tree-codes. A similar approach for shared memory machines, and one of the... |

100 |
Rokhlin V: A new version of the fast multipole method for the Laplace equation in three dimension
- Greengard
(Show Context)
Citation Context ...e difficulties in devising work-efficient multipole-to-local translation schemes in three dimensions. For example, it took more than ten years to obtain such a scheme for the Laplace-based FMM scheme =-=[9]-=-. To our knowledge a general purpose and efficient FMM method appears to be an open problem. In this paper, we present a new kernel-independent FMM-like algorithm, which requires only kernel evaluatio... |

86 | Astrophysical n-body simulations using hierarchical tree data structures
- Warren, Salmon
- 1992
(Show Context)
Citation Context ...ed work on parallel tree-codes. The first successful distributed-memory parallel implementations for non-uniform particle distributions were obtained for the Barnes-Hut algorithm by Warren and Salmon =-=[23]-=-. Key ideas in this paper were the local essential trees (LETs), which provide a framework for parallelization of Barnes-Hut algorithm and can be extended to the FMM. The hashed octree data structures... |

80 |
An implementation of the fast multipole method without multipoles
- Anderson
- 1992
(Show Context)
Citation Context ...cian kernel (screened interactions) and the Stokes kernel (incompressible fluids and solids). The idea of using a set of equivalent sources to represent the far field was first introduced by Anderson =-=[1]-=-: the far field is represented as the solution of an exterior Dirichlet problem on a ball surrounding the particles by means of the exact Green’s function (Poisson formula) for the Laplacian. The meth... |

67 | NAMD: Biomolecular simulation on thousands of processors
- Phillips, Zheng, et al.
- 2002
(Show Context)
Citation Context ...tures and discussions on the theory of partitioning and complexity can be found in [20] and [22]. Other approaches for particle interactions include particle-mesh algorithms like those used in NAMD-2 =-=[18]-=- which employs FFTs for Ewald summation on regular grids. Such approaches could be extended to more general kernels, but they are restricted to approximately uniform particle distributions. Parallel S... |

60 |
PETSc home page. http://www.mcs.anl.gov/petsc
- Balay, Buschelman, et al.
- 2001
(Show Context)
Citation Context ... experiments is 10−5 . Our algorithm has been implemented in C++. We used the fast exponential, square root and reciprocal libraries in the CXML routines, FFTW [5] for the M2L translations, and PETSc =-=[2]-=- for profiling and for its Krylov iterative solvers. All our tests were performed on the Pittsburgh Supercomputing Center’s TCS-1 terascale computing HP Alphaserver Cluster comprising of 750 SMP ES45 ... |

53 |
A Fast Adaptive Multipole Algorithm in Three Dimensions
- Cheng, Greengard, et al.
- 1999
(Show Context)
Citation Context ...ee. SC’03, November 15-21, 2003, Phoenix, Arizona, USA Copyright 2003 ACM 1-58113-695-1/03/0011...$5.00 1sresulting in performances that are on par with the fastest known adaptive FMM implementations =-=[4]-=-. Our algorithm has exactly the same structure as the original FMM, and thus, is highly parallelizable. Indeed, in our implementation we use standard methods in the parallel tree-code literature; and,... |

36 | A parallel adaptive fast multipole method
- Singh, Holt, et al.
- 1993
(Show Context)
Citation Context ...oning and load balancing, and further increased efficiency and scalability of tree-codes. A similar approach for shared memory machines, and one of the first scalable FMM implementations, is found in =-=[21]-=-, in which a cost-zones partitioning is used with orthogonal recursive bisection. A comparison between FMM algorithms, hybrids, and the Barnes-Hut method can be found in [3]. The main conclusion is th... |

27 |
Provably good partitioning and load balancing algorithms for parallel adaptive N-body simulation
- Teng
- 1998
(Show Context)
Citation Context ...ementations that scale to 24 millions of particles on thousands of processors [14, 15]. Efficient data-structures and discussions on the theory of partitioning and complexity can be found in [20] and =-=[22]-=-. Other approaches for particle interactions include particle-mesh algorithms like those used in NAMD-2 [18] which employs FFTs for Ewald summation on regular grids. Such approaches could be extended ... |

25 | A practical comparison of N-body algorithms
- Blelloch, Narlikar
- 1997
(Show Context)
Citation Context ...lementations, is found in [21], in which a cost-zones partitioning is used with orthogonal recursive bisection. A comparison between FMM algorithms, hybrids, and the Barnes-Hut method can be found in =-=[3]-=-. The main conclusion is that for higher accuracies, FMM is the fastest method. Another nice comparison between different platforms and algorithms can be found in Hu and Johnsson [11], in which the au... |

17 |
S (2001) Application of fast multipole Galerkin boundary integral equation method to crack problems
- Yoshida, Nishimura, et al.
(Show Context)
Citation Context ...of the kernel, i.e. such expansions need to be carried out differently for different kernels. This makes the implementation of efficient and accurate FMM accelerators somewhat tedious [6], [8], [19], =-=[26]-=-. Another problem could be related to the difficulties in devising work-efficient multipole-to-local translation schemes in three dimensions. For example, it took more than ten years to obtain such a ... |

15 |
Linear integral equations Applied Mathematical Sciences vol 82 2nd edn
- Kress
- 1999
(Show Context)
Citation Context ...locations yB,u in N B (Figure 2.1). We call φB,u the upward equivalent density and yB,u the upward equivalent surface. Results from potential theory put two restrictions on the positions of yB,u (see =-=[12]-=-, chapter 6). Firstly, to guarantee the smoothness of the potential produced by φB,u , its support yB,u should not overlap with F B . Secondly, to guarantee that φB,u is “rich” enough to represent the... |

13 |
A new version of the fast multipole method for screened Coulomb interactions in three dimensions
- Greengard, Huang
(Show Context)
Citation Context ...expansions of the kernel, i.e. such expansions need to be carried out differently for different kernels. This makes the implementation of efficient and accurate FMM accelerators somewhat tedious [6], =-=[8]-=-, [19], [26]. Another problem could be related to the difficulties in devising work-efficient multipole-to-local translation schemes in three dimensions. For example, it took more than ten years to ob... |

10 |
Parallel multilevel preconditioned conjugate-gradient approach to variable-charge molecular dynamics
- Nakano
- 1997
(Show Context)
Citation Context ...MM for electromagnetics [10]; Helmholtz-type problems using optimal M2L translations [16]; and molecular dynamics FMM implementations that scale to 24 millions of particles on thousands of processors =-=[14, 15]-=-. Efficient data-structures and discussions on the theory of partitioning and complexity can be found in [20] and [22]. Other approaches for particle interactions include particle-mesh algorithms like... |

9 | A data-parallel implementation of o(n) hierarchical N-body methods
- Hu, Johnsson
- 1996
(Show Context)
Citation Context ...d can be found in [3]. The main conclusion is that for higher accuracies, FMM is the fastest method. Another nice comparison between different platforms and algorithms can be found in Hu and Johnsson =-=[11]-=-, in which the authors report results on up to 100 million particles on uniform particle distributions on a CM-5. Recent papers on distributed-memory implementations include FMM for electromagnetics [... |

7 |
Large scale simulation of suspensions with PVM
- Phan-Thien, Lee, et al.
- 1997
(Show Context)
Citation Context ...mmation on regular grids. Such approaches could be extended to more general kernels, but they are restricted to approximately uniform particle distributions. Parallel Stokes solvers were presented in =-=[17]-=-, but without FMM or Barnes-Hut acceleration. Organization of the paper. In Section 2 we briefly describe our kernel-independent method. Section 3 explains the parallel algorithm. Section 4 presents t... |

6 |
An O(N) taylor sereis multipole boundary element method for three-dimensional elasticity problems. Engineering Analysis with Boundary Elements
- Popov, Power
- 2001
(Show Context)
Citation Context ...sions of the kernel, i.e. such expansions need to be carried out differently for different kernels. This makes the implementation of efficient and accurate FMM accelerators somewhat tedious [6], [8], =-=[19]-=-, [26]. Another problem could be related to the difficulties in devising work-efficient multipole-to-local translation schemes in three dimensions. For example, it took more than ten years to obtain s... |

5 | A scalable parallel fast multipole method for analysis of scattering from perfect electrically conducting surfaces
- Hariharan, Aluru, et al.
- 2002
(Show Context)
Citation Context ...], in which the authors report results on up to 100 million particles on uniform particle distributions on a CM-5. Recent papers on distributed-memory implementations include FMM for electromagnetics =-=[10]-=-; Helmholtz-type problems using optimal M2L translations [16]; and molecular dynamics FMM implementations that scale to 24 millions of particles on thousands of processors [14, 15]. Efficient data-str... |

5 | A kernel independent fast multipole algorithm for radial basis functions
- Ying
(Show Context)
Citation Context ... detailed review of other kernel-independent methods, along with a convergence proof, error analysis, and numerical results on the accuracy and complexity of the sequential algorithm, can be found in =-=[25]-=-. Related work on parallel tree-codes. The first successful distributed-memory parallel implementations for non-uniform particle distributions were obtained for the Barnes-Hut algorithm by Warren and ... |

4 |
adaptive, multipole-accelerated interative methods for three-dimensoinal first-kind integral equations of potential theory
- Preconditioned
- 1994
(Show Context)
Citation Context ...ctions and problems with algorithmic scalability. In our case, however, we expect good algorithmic scalability since FMM is an O(N) algorithm under reasonable assumptions on the particle distribution =-=[13]-=-. Before we describe our numerical experiments, we site two main conclusions from our work on the sequential performance of our method [25]: First, the most expensive parts of the FMM algorithm are th... |

4 |
et al. Scalable atomistic simulation algorithms for materials research
- Nakano
- 2001
(Show Context)
Citation Context ...MM for electromagnetics [10]; Helmholtz-type problems using optimal M2L translations [16]; and molecular dynamics FMM implementations that scale to 24 millions of particles on thousands of processors =-=[14, 15]-=-. Efficient data-structures and discussions on the theory of partitioning and complexity can be found in [20] and [22]. Other approaches for particle interactions include particle-mesh algorithms like... |

4 |
Scalable electromagnetic scattering calculations on the SGI Origin 2000
- Ottusch, Stalzer, et al.
- 1999
(Show Context)
Citation Context ...articles on uniform particle distributions on a CM-5. Recent papers on distributed-memory implementations include FMM for electromagnetics [10]; Helmholtz-type problems using optimal M2L translations =-=[16]-=-; and molecular dynamics FMM implementations that scale to 24 millions of particles on thousands of processors [14, 15]. Efficient data-structures and discussions on the theory of partitioning and com... |

3 |
et al. A fast solution for three-dimensional many-particle problems of linear elasticity
- Fu
- 1998
(Show Context)
Citation Context ...ytic expansions of the kernel, i.e. such expansions need to be carried out differently for different kernels. This makes the implementation of efficient and accurate FMM accelerators somewhat tedious =-=[6]-=-, [8], [19], [26]. Another problem could be related to the difficulties in devising work-efficient multipole-to-local translation schemes in three dimensions. For example, it took more than ten years ... |

2 |
Sevilgen and Srinivas Aluru. A unifying data structure for hierarchical methods
- Fatih
- 1999
(Show Context)
Citation Context ... FMM implementations that scale to 24 millions of particles on thousands of processors [14, 15]. Efficient data-structures and discussions on the theory of partitioning and complexity can be found in =-=[20]-=- and [22]. Other approaches for particle interactions include particle-mesh algorithms like those used in NAMD-2 [18] which employs FFTs for Ewald summation on regular grids. Such approaches could be ... |