#### DMCA

## Scalable Data-Privatization Threading for Hybrid MPI/OpenMP Parallelization of Molecular Dynamics

Citations: | 1 - 1 self |

### Citations

694 | Computer Simulation Using Particles - Hockney, Eastwood - 1988 |

383 | GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. - Hess - 2008 |

241 | Particle mesh Ewald: An N log(N) method for Ewald sums in large systems - Darden, York, et al. - 1993 |

54 | Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: - Rabenseifner, Hager, et al. - 2009 |

38 | Hypergraph-based dynamic load balancing for adaptive scientific computations
- Catalyurek, Boman, et al.
- 2007
(Show Context)
Citation Context ...more irregular applications, it is conceivable to combine the light-overhead thread-level load balancing in this paper with a high quality node-level load balancer such as a hypergraph-based approach =-=[24]-=-. B. Memory Footprint To test the memory efficiency of the proposed method, we perform simulations on a four quad-core AMD Opteron 2.3 GHz machine with a fixed number of particles n = 8,192, 16,000, a... |

14 | et al., Millisecond-scale molecular dynamics simulations on anton - Shaw, Dror, et al. |

12 | The fast Fourier Poisson method for calculating Ewald sums
- YORK, W
- 1994
(Show Context)
Citation Context ...e distance), requiring O(N 2 ) operations to evaluate. Many methods exist to reduce this computational complexity [1315]. We focus on the highly efficient particle-particle/particlemesh (PPPM) method =-=[13]-=-. In PPPM the Coulomb potential is decomposed into two parts: A short-range part that converges quickly in real space and a long-range part that converges quickly in reciprocal space. The split of wor... |

7 | Simulating solidification in metals at high pressure: The drive to petascale computing. - Streitz, Glosli, et al. - 2006 |

7 | An Events Based Algorithm for Distributing Concurrent Tasks on Multi-Core Architectures
- Holmes, Williams, et al.
- 2010
(Show Context)
Citation Context ...utation. Usually used in GPGPU threading [18, 19]. • Spatial decomposition coloring [20]—scalable without increasing computation, but can cause load imbalance. • Mutually exclusive dynamic scheduling =-=[21, 22]-=-—robust and suited for dynamic load balancing, but can incur considerable overhead for context switching. • Data privatization—no penalty on computation, but with excessive O(np) memory requirement pe... |

6 |
Beyond homogeneous decomposition: scaling longrange forces on Massively Parallel Systems,”
- Richards, Glosli, et al.
- 2009
(Show Context)
Citation Context ...ular Dynamics I. INTRODUCTION Molecular dynamics (MD) simulation is widely used to study material properties at the atomistic level. Large-scale MD simulations are beginning to address broad problems =-=[16]-=-, but increasingly large computing power is needed to encompass even larger spatiotemporal scales. For example, Glosli et al. performed a massively parallel MD simulation involving 62 billion particle... |

5 | et al., "NAMD: Biomolecular simulations on thousands of processors - Phillips - 2002 |

5 | A scalable hierarchical parallelization framework for molecular dynamics simulation on multicore clusters - Peng, Kunaseth, et al. |

2 | et al., "Zonal methods for the parallel execution of range-limited N-body simulations - Bowers |

2 |
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms [C]. ICPPW 2009
- Hu, Liu, et al.
(Show Context)
Citation Context ...ave been proposed to solve these problems: • Duplicated pair-force computation—simple and scalable, but doubles computation. Usually used in GPGPU threading [18, 19]. • Spatial decomposition coloring =-=[20]-=-—scalable without increasing computation, but can cause load imbalance. • Mutually exclusive dynamic scheduling [21, 22]—robust and suited for dynamic load balancing, but can incur considerable overhe... |

1 |
et al., "A metascalable computing framework for large spatiotemporal-scale atomistic simulations
- Nomura
- 2009
(Show Context)
Citation Context ...al parallelization frameworks, which integrate several parallel methods to provide different levels of parallelism, have been proposed as a solution to this scalability problem on multicore platforms =-=[6, 9-11]-=-. Hybrid parallelization based on MPI/threading schemes will likely replace MPI-only parallel MD. However, efficiently integrating a multi-threading framework into an existing MPIonly code is difficul... |

1 |
et al., "Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability
- Glosli
- 2007
(Show Context)
Citation Context ...mporal scales. For example, Glosli et al. performed a massively parallel MD simulation involving 62 billion particles using the MD code ddcMD, which demonstrated excellent performance and scalability =-=[7]-=-. Due to shifting trends in computer architecture, improvements in computing power are now gained using multicore architectures instead of increased clock speed. Furthermore, the number of cores per c... |

1 |
et al., "Impact of multicores on large-scale molecular dynamics simulations
- Alam
- 2008
(Show Context)
Citation Context ...rformance of traditional parallel applications, which are solely based on the message passing interface (MPI), is expected to degrade PDPTA’11, July 18-21, 2011, Las Vegas, Nevada, USA. substantially =-=[8]-=-. Hierarchical parallelization frameworks, which integrate several parallel methods to provide different levels of parallelism, have been proposed as a solution to this scalability problem on multicor... |

1 | et al., "Hybrid message-passing and shared-memory programming in a molecular dynamics application on multicore clusters - Chorley - 2009 |

1 |
et al., "Dynamic load balancing on single- and multi-GPU systems
- Long
(Show Context)
Citation Context ... significant overhead, and limit the threading speedup for a large number of threads; and 3) dynamic nature of MD requires low-overhead dynamic load balancing for threads to maintain good performance =-=[12]-=-. To address these issues, we have designed a load balancing spanning forest (LBSF) partitioning algorithm, which combines: 1) fine-grain dynamic load balancing; and 2) minimal memory-footprint data p... |

1 |
et al., "GPU-accelerated molecular dynamics simulation for study of liquid crystalline flows
- Sunarso
- 2010
(Show Context)
Citation Context ...particle concurrently. Several techniques have been proposed to solve these problems: • Duplicated pair-force computation—simple and scalable, but doubles computation. Usually used in GPGPU threading =-=[18, 19]-=-. • Spatial decomposition coloring [20]—scalable without increasing computation, but can cause load imbalance. • Mutually exclusive dynamic scheduling [21, 22]—robust and suited for dynamic load balan... |

1 |
et al., "GPU accelerated molecular dynamics simulation of thermal conductivities
- Yang
- 2007
(Show Context)
Citation Context ...particle concurrently. Several techniques have been proposed to solve these problems: • Duplicated pair-force computation—simple and scalable, but doubles computation. Usually used in GPGPU threading =-=[18, 19]-=-. • Spatial decomposition coloring [20]—scalable without increasing computation, but can cause load imbalance. • Mutually exclusive dynamic scheduling [21, 22]—robust and suited for dynamic load balan... |

1 |
et al., "Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
- Madduri
- 2009
(Show Context)
Citation Context ...utation. Usually used in GPGPU threading [18, 19]. • Spatial decomposition coloring [20]—scalable without increasing computation, but can cause load imbalance. • Mutually exclusive dynamic scheduling =-=[21, 22]-=-—robust and suited for dynamic load balancing, but can incur considerable overhead for context switching. • Data privatization—no penalty on computation, but with excessive O(np) memory requirement pe... |