## Fast Multipole Methods on Graphical Processors

Venue: | Journal of Computational Physics |

Citations: | 17 - 5 self |

### BibTeX

@ARTICLE{Gumerov_fastmultipole,

author = {Nail A. Gumerov and Ramani Duraiswami},

title = {Fast Multipole Methods on Graphical Processors},

journal = {Journal of Computational Physics},

year = {},

pages = {8290--8313}

}

### OpenURL

### Abstract

The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at points distributed inside a computational domain at a large number of evaluation points to a specified accuracy ɛ. The method scales as O (N) compared to the direct method with complexity O(N 2), which allows one to solve larger scale problems. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on dataparallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architectures, and determined optimal settings for the FMM on the GPU, which are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range 30-60 compared to a serial CPU implementation for benchmark cases of up to million size. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at 24-43 Teraflop rate if we use straightforward summation. 1