## Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity (1995)

Venue: | Journal Of Parallel and Distributed Computing |

Citations: | 67 - 2 self |

### BibTeX

@ARTICLE{Singh95loadbalancing,

author = {Jaswinder Pal Singh and Chris Holt and Takashi Totsuka and Anoop Gupta and John L. Hennessy},

title = {Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity},

journal = {Journal Of Parallel and Distributed Computing},

year = {1995},

volume = {27},

pages = {118--141}

}

### Years of Citing Articles

### OpenURL

### Abstract

processes, are increasingly being used to solve large-scale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for long-range communication.

### Citations

782 | A Fast Algorithm for Particle Simulations - Greengard, Rokhlin - 1987 |

385 | A rapid hierarchical radiosity algorithm
- Hanrahan, Salzman, et al.
- 1991
(Show Context)
Citation Context ...diosity application, as we shall see. 4We sometimes use the term patch to include leaf-level elements in our description of the algorithm. 10 4.2.1 The Sequential Algorithm The hierarchical radiosity =-=[16]-=- algorithm proceeds as follows. The input polygons that comprise the scene are first inserted into a binary space partitioning (BSP) tree [ 11] to facilitate efficient visibility computation between a... |

358 |
The Rapid Evaluation of Potential Fields in Particle Systems
- Greengard
- 1988
(Show Context)
Citation Context ...ty in various problem domains. To demonstrate the wide-ranging applicability of these methods and their consequent importance for high-performance computing, we list some of the problem domains below =-=[14]-=-: 1. Astrophysics: The bodies in the system are stars or planets in a galaxy, and the governing interaction law is gravitational. 2. Plasma Physics: The bodies are ions or electrons, and the governing... |

352 | Algorithms for Minimization without Derivatives
- Brent
- 1973
(Show Context)
Citation Context ... finds the bisector in a given direction. We experimented with different root finding algorithms for discrete functions, such as the bisection and Van Wijngaarden-Dekker-Brent algorithms described in =-=[9, 4]-=-. The best performance we obtained was from a decisection algorithm (similar to bisection, except that the currently guessed domain is split into 10 equal subdomains rather than 2 at every step in the... |

350 |
The directorybased cache coherence protocol for the DASH multiprocessor
- Lenoski, Laudon, et al.
- 1990
(Show Context)
Citation Context ... high-performance research machine--and a simulated multiprocessor. I Interconnection Network I Figure 10: The simulated multiprocessor architecture. The Stanford DASH Multiprocessor The DASH machine =-=[20]-=- has 48 processors organized in 12 clusters. 6 A cluster comprises 4 MIPS R3000 processors connected by a shared bus, and clusters are connected together in a mesh network. Every processor has a 64KB ... |

287 |
The Hemicube: A Radiosity Solution for Complex Environments, Computer Graphics 19(3
- Cohen, Greenberg
- 1985
(Show Context)
Citation Context ...ination in a scene is a critical problem in computer graphics. The two dominant, and very different, approaches to solving this problem are the ray-tracing and radiosity methods. The radiosity method =-=[8]-=-, which is view-independent and based on the physics of light transport, has been most successful in producing realistic computer-generated images of complex scenes. Since this method accounts for bot... |

212 |
Rapid solution of integral equations of classical potential theory
- Rokhlin
- 1983
(Show Context)
Citation Context ...ic. 5. Boundary Value Problems: Integral equations resulting from boundary value problems can be solved rapidly by N-body methods, where/V is the number of nodes in the discretization of the boundary =-=[21]-=-. 6. Numerical Complex Analysis: Many problems in this field can be reduced to computing a Cauchy integral, which can itself be viewed as equivalent to an electrostatic problem. 7. Computer Graphics: ... |

158 |
et al., Numerical Recipes
- Press
- 1992
(Show Context)
Citation Context ... finds the bisector in a given direction. We experimented with different root finding algorithms for discrete functions, such as the bisection and Van Wijngaarden-Dekker-Brent algorithms described in =-=[9, 4]-=-. The best performance we obtained was from a decisection algorithm (similar to bisection, except that the currently guessed domain is split into 10 equal subdomains rather than 2 at every step in the... |

154 |
A Hierarchical O(N log N) Force Calculation Algorithm. Nature
- Barnes, Hut
- 1986
(Show Context)
Citation Context ...a locality. In this paper, we study partitioning and scheduling issues in representative applications that use three important hierarchical N-body methods. Two of these methods--the Barnes-Hut method =-=[3]-=- and Greengard and Rokhlin's Fast Multipole Method (FMM) [15]--are the best methods known for classical N-body problems, such as those in astrophysics, electrostatics and molecular dynamics. In additi... |

135 |
Numerical study of slightly viscous flow
- Chorin
- 1973
(Show Context)
Citation Context ... interaction term as well, so that longrange interactions must be considered to study some important properties (such as dielectric properties, for example). 4. Fluid Dynamics: The vortex blob method =-=[6]-=- for solving the Navier-Stokes equations requires the interactions among/V vortex blobs, where the long-range interaction law is Coulombic. 5. Boundary Value Problems: Integral equations resulting fro... |

94 |
Parallel hierarchical N-body methods
- Salmon
- 1991
(Show Context)
Citation Context ...iques are implemented by the programmer. The only other parallel version of a nonuniform hierarchical N-body application is a message-passing implementation of an astrophysical Barnes-Hut application =-=[23]-=-. It uses an orthogonal recursive bisection (ORB) partitioning technique to obtain both load balancing and data locality. (It is also substantially complicated by the lack of a shared-address-space pr... |

75 | Working sets, cache sizes and node granularity issues for large-scale multiprocessors
- Rothberg, Singh, et al.
- 1993
(Show Context)
Citation Context ...he working sets of schemes with poor locality. Infinite caches do not capture this effect. However, this is not a very significant issue in our applications since the important working sets are small =-=[22]-=-. Besides, infinite caches are better at measuring inherent communication, which is what we want to compare using the simulator. 15 7.2 Organization of Experiments For each application, we first exami... |

65 |
Parallel Hierarchical N-Body Methods and their Implication for Multiprocessors
- Singh
- 1993
(Show Context)
Citation Context ... significant when larger machines are used, however, as indicated by Figure 17. This is particularly true since the number of particles is not expected to scale linearly with the number of processors =-=[27, 24]-=-. Besides performance benefits on large machines, physically contiguous partitions have other important advantages as well. They allow us to use a more efficient tree-building algorithm, as we shall s... |

58 |
Scaling parallel programs for multiprocessors: Methodology and examples
- Singh, Hennessy, et al.
- 1993
(Show Context)
Citation Context ... significant when larger machines are used, however, as indicated by Figure 17. This is particularly true since the number of particles is not expected to scale linearly with the number of processors =-=[27, 24]-=-. Besides performance benefits on large machines, physically contiguous partitions have other important advantages as well. They allow us to use a more efficient tree-building algorithm, as we shall s... |

52 |
Near real-time shade display of rigid objects
- Fuchs, Abram, et al.
- 1983
(Show Context)
Citation Context ...4.2.1 The Sequential Algorithm The hierarchical radiosity [16] algorithm proceeds as follows. The input polygons that comprise the scene are first inserted into a binary space partitioning (BSP) tree =-=[11]-=- to facilitate efficient visibility computation between a pair of patches. (The BSP tree and its use in visibility computation are described in Appendix A). Every input polygon is initially given a li... |

42 |
An O(n) algorithm for three-dimensional N-body simulations
- Zhao
- 1987
(Show Context)
Citation Context ...methods have been used most widely in so far, and we use it as being representative of nonuniform classical domains. Several hierarchical methods have been proposed to solve classical N-body problems =-=[2, 18, 3, 15, 29, 7]-=-. The most widely used and promising among these are the Barnes-Hut method [3] and the Fast Multipole Method [15]. Between them, these two methods also capture all the important characteristics of hie... |

34 | Computational structure of the N-body problem
- Katzenelson
- 1989
(Show Context)
Citation Context ...ut cell (7 is not well separated from cell D. In fact, both the FMM and the Barnes-Hut method have been shown to satisfy the same recursive set of equations, only using different elementary functions =-=[19]-=-. The primary differences between the two methods are:sWhile the Barnes-Hut method directly computes only particle-particle or particle-cell interactions, the FMM also computes interactions between in... |

33 |
Performance Debugging Shared Memory Multiprocessor Programs with MTOOL
- Goldberg, Hennessy
(Show Context)
Citation Context ...t results that separately compare the load balancing and communication behavior of different schemes. These results are obtained on the simulator, and we have corroborated their trends with the MTOOL =-=[12]-=- performance debugger on DASH. We compare load balancing behavior by measuring the time that processes spend waiting at synchronization points. In comparing communication behavior, we focus on inheren... |

28 |
Tango introduction and tutorial
- Goldschmidt, Davis
- 1990
(Show Context)
Citation Context ...to us (inherent communication in the program, for example). To overcome these limitations, we also perform experiments on an event-driven simulator of an idealized shared-address-space multiprocessor =-=[13]-=-. The simulated multiprocessor looks exactly like that described in Figure 10, with the simple, three-level, nonuniform memory hierarchy. The timing of a simulated processor's instruction set is desig... |

19 |
Implications of hierarchical N-body techniques for multiprocessor architecture
- Singh, Hennessy, et al.
- 1992
(Show Context)
Citation Context ...hogonal recursive bisection (ORB) partitioning technique to obtain both load balancing and data locality. (It is also substantially complicated by the lack of a shared-address-space programming model =-=[28]-=-). We propose a new partitioning technique called costzones for the classical applications. Costzones is much simpler to implement than ORB, and performs better on shared address space machines, parti... |

17 |
An efficient program for many body simulation
- Appel
- 1985
(Show Context)
Citation Context ...mputing interactions is O(, 2), which is prohibitive for large systems. Hierarchical tree-based methods have therefore been developed that reduce the complexity to O(, lo9 ) for general distributions =-=[2, 3]-=-, or even O0, ) for uniform distributions [15], without losing much accuracy in long-range interactions. Far enough Point of away evaluationsEquivalent Group of - particles Figure 1: Approximation of ... |

13 |
and Piet Hut. A hierarchical O(N log N) force calculation algorithm
- Barnes
- 1986
(Show Context)
Citation Context ...ta locality. In this paper, we study partitioning and scheduling issues in representative applications that use three important hierarchical N-body methods. Two of these methods—the Barnes-Hut method =-=[3]-=- and Greengard and Rokhlin’s Fast Multipole Method (FMM) [15]—are the best methods known for classical N-body problems, such as those in astrophysics, electrostatics and molecular dynamics. In additio... |

13 |
Hierarchical N-body methods
- Hernquist
- 1988
(Show Context)
Citation Context ...tree which represent space that is physically close to it, and groups particles at a hierarchy of length scales. The complexity of the force-computation phase scales as 12 log for realistic values of =-=[17]-=-. The Available Parallelism Each of the phases in a time-step can be executed internally in parallel. We do not exploit parallelism across phases or time-steps explicitly, except to avoid synchronizin... |

8 | Hierarchical algorithms and architectures for parallel scientific computing
- Chan
- 1990
(Show Context)
Citation Context ...main. Prominent among these algorithms are N-body methods, multigrid methods, domain decomposition methods, multi-level preconditioners, adaptive mesh-refinement algorithms, and wavelet basis methods =-=[5]-=-. Our focus in this paper is on hierarchical N-body methods. The classical N-body problem models a physical domain as a system of , discrete bodies, and studies the evolution of this system under the ... |

8 |
A tree code with logarithmic reduction of force terms, hierarchical regularization of all variables and explicit accuracy controls
- Jernigan, Porter
- 1989
(Show Context)
Citation Context ...methods have been used most widely in so far, and we use it as being representative of nonuniform classical domains. Several hierarchical methods have been proposed to solve classical N-body problems =-=[2, 18, 3, 15, 29, 7]-=-. The most widely used and promising among these are the Barnes-Hut method [3] and the Fast Multipole Method [15]. Between them, these two methods also capture all the important characteristics of hie... |

5 |
Numerical Algorithms for Modern Parallel Computer Architectures
- Fox
- 1988
(Show Context)
Citation Context ... Partitioning Space: Orthogonal Recursive Bisection Orthogonal Recursire Bisection (ORB) is a technique for providing physical locality in a problem domain by explicitly partitioning the domain space =-=[10]-=-. It was first used for hierarchical N-body problems in Salmon's message-passing Barnes-Hut implementation [23]. The idea in ORB partitioning is to recursively divide the computational domain space in... |

2 |
Robust vortex methods for three-dimensional incompressible flows
- Chua, Leonard, et al.
- 1988
(Show Context)
Citation Context ...methods have been used most widely in so far, and we use it as being representative of nonuniform classical domains. Several hierarchical methods have been proposed to solve classical N-body problems =-=[2, 18, 3, 15, 29, 7]-=-. The most widely used and promising among these are the Barnes-Hut method [3] and the Fast Multipole Method [15]. Between them, these two methods also capture all the important characteristics of hie... |

2 |
Hierarchical N-body methods
- Hemquist
- 1988
(Show Context)
Citation Context ...ree which represent space that is physically close to it, and groups particles at a hierarchy of length scales. The complexity of the force-computation phase scales asslog , for realistic values of 0 =-=[17]-=-. The Available Parallelism Each of the phases in a time-step can be executed internally in parallel. We do not exploit parallelism across phases or time-steps explicitly, except to avoid synchronizin... |

2 |
High Performance Computing II, chapter Data Locality and Memory
- Singh, Hennessy
- 1991
(Show Context)
Citation Context ...riate for many scientific applications, in which the work per iteration in parallel loops is uniform and locality in iteration space translates naturally to the desired locality in the problem domain =-=[25, 26]-=-. TM In our hierarchical N-body applications, however, the static scheme described above neither guarantees load balancing nor provides data locality. The load imbalance results from the fact that an ... |

2 |
Parallelism, locality and scaling in a molecular dynamics simulation
- Singh, Hennessy
- 1992
(Show Context)
Citation Context ...riate for many scientific applications, in which the work per iteration in parallel loops is uniform and locality in iteration space translates naturally to the desired locality in the problem domain =-=[25, 26]-=-. TM In our hierarchical N-body applications, however, the static scheme described above neither guarantees load balancing nor provides data locality. The load imbalance results from the fact that an ... |