## Fast Parallel Algorithms for Short-Range Molecular Dynamics (1995)

Venue: | JOURNAL OF COMPUTATIONAL PHYSICS |

Citations: | 236 - 6 self |

### BibTeX

@ARTICLE{Plimpton95fastparallel,

author = {Steve Plimpton},

title = { Fast Parallel Algorithms for Short-Range Molecular Dynamics},

journal = {JOURNAL OF COMPUTATIONAL PHYSICS},

year = {1995},

volume = {117},

pages = {1--19}

}

### Years of Citing Articles

### OpenURL

### Abstract

Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently -- those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed--memory parallel machine which allows for message--passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers -- the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventi...

### Citations

983 |
Fluctuations: Computer Simulations of Liquids
- Allen, Tildesley
- 1987
(Show Context)
Citation Context ...at http://www.cs.sandia.gov/∼sjplimp/main.html 11 Introduction Classical molecular dynamics (MD) is a commonly used computational tool for simulating the properties of liquids, solids, and molecules =-=[1, 2]-=-. Each of the N atoms or molecules in the simulation is treated as a point mass and Newton’s equations are integrated to compute their motion. From the motion of the ensemble of atoms a variety of use... |

794 | A fast algorithm for particle simulations
- Greengard, Rohklin
- 1987
(Show Context)
Citation Context ...ome this difficulty. They include particle–mesh algorithms [31] which scale as f(M)N where M is the number of mesh points, hierarchical methods [6] which scale as N log(N), and fast–multipole methods =-=[23]-=- which scale as N. Recent parallel implementations of these algorithms [19, 56] have improved their range of applicability for many– body simulations, but because of their expense, long–range force mo... |

426 |
Computer Simulation Using Particles
- Hockney, Eastwood
- 1988
(Show Context)
Citation Context ...tom interacts with all others. Directly computing these forces scales as N 2 and is too costly for large N. Various approximate methods overcome this difficulty. They include particle–mesh algorithms =-=[31]-=- which scale as f(M)N where M is the number of mesh points, hierarchical methods [6] which scale as N log(N), and fast–multipole methods [23] which scale as N. Recent parallel implementations of these... |

221 |
A hierarchical O(N log N) force-calculation algorithm
- Barnes, Hut
- 1986
(Show Context)
Citation Context ...oo costly for large N. Various approximate methods overcome this difficulty. They include particle–mesh algorithms [31] which scale as f(M)N where M is the number of mesh points, hierarchical methods =-=[6]-=- which scale as N log(N), and fast–multipole methods [23] which scale as N. Recent parallel implementations of these algorithms [19, 56] have improved their range of applicability for many– body simul... |

188 | An improved spectral graph partitioning algorithm for mapping parallel computations - Hendrickson, Leland - 1995 |

175 |
Computer experiments on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules
- Verlet
- 1967
(Show Context)
Citation Context ...ish this on serial and vector machines; we discuss them briefly here since our parallel algorithms incorporate similar ideas. The first idea, that of neighbor lists, was originally proposed by Verlet =-=[55]-=-. For each atom, a list is maintained of nearby atoms. Typically, when the list is formed, all neighboring atoms within an extended cutoff distance rs = rc + δ are stored. The list is used for a few t... |

66 | The torus-wrap mapping for dense matrix calculations on massively parallel computers
- Hendrickson, Womble
- 1994
(Show Context)
Citation Context ...f the workload. As we shall see, this improves the O(N) scaling of the communication cost to O(N/ √ P ). Block–decompositions of matrices are common in linear algebra algorithms for parallel machines =-=[10, 28, 33]-=- which sparked our interest in the idea, but to our knowledge we are the first to apply this idea to short–range MD simulations [29, 43, 42]. The assignment of sub–blocks of the force matrix to proces... |

61 |
Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals
- Daw, Baskes
- 1984
(Show Context)
Citation Context ...ractions are included in the force model, one must insure some processor knows sufficient information to compute any given interaction. An implementation for the embedded atom method (EAM) potentials =-=[18]-=- used in modeling metals and metal alloys is discussed in [43] and a FD implementation of the many–body forces (angular, torsional) encountered in molecular simulations is presented in [42]. We know o... |

53 | Interprocessor collective communication library (intercom
- Barnett, Gupta, et al.
- 1994
(Show Context)
Citation Context ...the other processors, an operation called all–to–all communication. Various 7algorithms have been developed for performing this operation efficiently on different parallel machines and architectures =-=[7, 22, 54]-=-. We use an idea outlined in Fox, et al. [22] that is simple, portable, and works well on a variety of machines. We describe it briefly because it is the chief communication component of both the AD a... |

43 |
SUNMOS for the Intel Paragon: A brief user's guide," presented at Intel Supercomputer Users
- Maccabe, McCurley, et al.
- 1994
(Show Context)
Citation Context ...st enhancement takes advantage of the fact that each “node” of the Paragon actually has two i860 processors, one for computation and one for communication. An option under the SUNMOS operating system =-=[35]-=- run on Sandia’s Paragon is to use the second processor for computation. This requires minor coding changes to stride the loops in the force and neighbor routines so that each processor can perform in... |

36 | A new parallel method for molecular dynamics simulation of macromolecular systems. Sandia Thechnical Report
- Plimpton, Hendrickson
- 1994
(Show Context)
Citation Context ...potentials [18] used in modeling metals and metal alloys is discussed in [43] and a FD implementation of the many–body forces (angular, torsional) encountered in molecular simulations is presented in =-=[42]-=-. We know of no simple way to use the FD idea for the more general case of simulations with dynamically changing connectivities, such as for silicon three–body potentials. Long–range pairwise forces c... |

28 |
Atomic level simulations on a million particles: The cell multipole method for coulomb and london nonbond interactions
- Ding, Karasawa, et al.
- 1992
(Show Context)
Citation Context ... as f(M)N where M is the number of mesh points, hierarchical methods [6] which scale as N log(N), and fast–multipole methods [23] which scale as N. Recent parallel implementations of these algorithms =-=[19, 56]-=- have improved their range of applicability for many– body simulations, but because of their expense, long–range force models are not commonly used in classical MD simulations. By contrast, short–rang... |

27 |
Multiple time step methods in molecular dynamics
- Streett, Tildesley, et al.
- 1978
(Show Context)
Citation Context ...ork of any of the 253 algorithms: on–the–fly computation of thermodynamic quantities and transport coefficients, triggering of neighbor list construction by atom movement, multiple–timescale methods =-=[37, 50]-=-, more sophisticated time integrators, and other statistical ensembles besides the constant NV E ensemble of the benchmark, e.g. constant NP T simulations. Virtually any form of short–range interatomi... |

26 |
Molecular dynamics on hypercube parallel computers
- Smith
- 1991
(Show Context)
Citation Context ...oss the processors so as to extract maximum parallelism. To our knowledge, all algorithms that have been proposed or implemented (including ours) have been variations on these two methods. References =-=[21, 25, 49]-=- include good overviews of various techniques. In the first class of methods a pre–determined set of force computations is assigned to each processor. The assignment remains fixed for the duration of ... |

25 |
de Geijn. Distributed memory matrix-vector multiplication and conjugate gradient algorithms
- Lewis, van
- 1993
(Show Context)
Citation Context ...f the workload. As we shall see, this improves the O(N) scaling of the communication cost to O(N/ √ P ). Block–decompositions of matrices are common in linear algebra algorithms for parallel machines =-=[10, 28, 33]-=- which sparked our interest in the idea, but to our knowledge we are the first to apply this idea to short–range MD simulations [29, 43, 42]. The assignment of sub–blocks of the force matrix to proces... |

22 |
Parallel Molecular Dynamics
- Clark, A, et al.
- 1991
(Show Context)
Citation Context ...orward computation of additional three– and four– body force terms. Parallel implementations of state–of–the–art biological MD programs such as CHARMM and GROMOS using this technique are discussed in =-=[13, 17]-=-. Force–decomposition methods which systolically cycle atom data around a ring or through a grid of processors have been used on MIMD [26, 49] and SIMD machines [16, 57]. Other force–decomposition met... |

20 |
Parallelization of CHARMm for MIMD machines
- Brooks, Hodošček
- 1992
(Show Context)
Citation Context ...orward computation of additional three– and four– body force terms. Parallel implementations of state–of–the–art biological MD programs such as CHARMM and GROMOS using this technique are discussed in =-=[13, 17]-=-. Force–decomposition methods which systolically cycle atom data around a ring or through a grid of processors have been used on MIMD [26, 49] and SIMD machines [16, 57]. Other force–decomposition met... |

18 |
Parallel approaches to short-range molecular dynamics simulations
- Tamayo, Mesirov, et al.
- 1991
(Show Context)
Citation Context ...stantial reworking of data structures and code. 6 Benchmark Problem The test case used to benchmark our three parallel algorithms is a MD problem that has been used extensively by various researchers =-=[9, 14, 20, 24, 30, 41, 47, 51, 52]-=-. It models atom interactions with a Lennard–Jones potential energy between pairs of atoms separated by a distance r as 17[ Φ(r) = 4ɛ ( σ r )12 − ( σ r )6] where ɛ and σ are constants. The derivative... |

15 |
Efficient parallel implementation of molecular dynamics on a toroidal network: I. Parallelizing strategy
- Esselink, Smith, et al.
- 1993
(Show Context)
Citation Context ...IMD) parallel machines with a few dozens of processors [26, 37, 39, 46]. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines =-=[9, 14, 20, 41, 51]-=-. We are convinced that the message–passing model of programming for MIMD machines is the only one that provides enough flexibility to implement all the data structure and computational enhancements t... |

15 | Parallel Many-Body Simulations Without All-to-All Communication
- Hendrickson, Plimton
- 1994
(Show Context)
Citation Context ...e common in linear algebra algorithms for parallel machines [10, 28, 33] which sparked our interest in the idea, but to our knowledge we are the first to apply this idea to short–range MD simulations =-=[29, 43, 42]-=-. The assignment of sub–blocks of the force matrix to processors with a row–wise (calendar) ordering of the processors is depicted in Figure 5. We assume for ease of exposition that P is an even power... |

14 |
Quiet high-resolution computer models of a plasma
- Hockney, Goel, et al.
(Show Context)
Citation Context ...built, examining it for possible interactions is much faster than checking all atoms in the system. The second technique commonly used for speeding up MD calculations is known as the link-cell method =-=[32]-=-. At every timestep, all the atoms are binned into 3–D cells of side length d where d = rc or slightly larger. This reduces the task of finding neighbors of a given atom to checking in 27 bins — the b... |

14 |
Large-scale molecular dynamics simulation using vector and parallel computers
- Rapaport
- 1988
(Show Context)
Citation Context ...cessary to simulate even picoseconds of “real” time. Because of these computational demands, considerable effort has been expended by researchers to optimize MD calculations for vector supercomputers =-=[24, 30, 36, 45, 47]-=- and even to build special–purpose hardware for performing MD simulations [4, 5]. The current state–of–the–art is such that simulating ten– to hundred–thousand atom systems for picoseconds takes hours... |

14 |
de Geijn, ”Efficient global combine operations
- van
- 1991
(Show Context)
Citation Context ...the other processors, an operation called all–to–all communication. Various 7algorithms have been developed for performing this operation efficiently on different parallel machines and architectures =-=[7, 22, 54]-=-. We use an idea outlined in Fox, et al. [22] that is simple, portable, and works well on a variety of machines. We describe it briefly because it is the chief communication component of both the AD a... |

13 |
Vectorized link cell fortran code for molecular dynamics simulations for a large number of particles
- Grest, Duenweg, et al.
- 1989
(Show Context)
Citation Context ...cessary to simulate even picoseconds of “real” time. Because of these computational demands, considerable effort has been expended by researchers to optimize MD calculations for vector supercomputers =-=[24, 30, 36, 45, 47]-=- and even to build special–purpose hardware for performing MD simulations [4, 5]. The current state–of–the–art is such that simulating ten– to hundred–thousand atom systems for picoseconds takes hours... |

12 | der Vorst. Parallel LU Decomposition on a Transputer Network - Bisseling, van - 1989 |

12 |
Multi-milion particle molecular dynamics: II. design considerations for distributed processing
- Rapaport
- 1991
(Show Context)
Citation Context ...ms has been for single–instruction/multiple–data (SIMD) parallel machines such as the CM–2 [12, 52], or for multiple–instruction/multiple–data (MIMD) parallel machines with a few dozens of processors =-=[26, 37, 39, 46]-=-. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines [9, 14, 20, 41, 51]. We are convinced that the message–passing model of... |

9 |
Molecular dynamics simulation on a parallel computer. Molec
- Heller, Grubmuller, et al.
- 1990
(Show Context)
Citation Context ...ms has been for single–instruction/multiple–data (SIMD) parallel machines such as the CM–2 [12, 52], or for multiple–instruction/multiple–data (MIMD) parallel machines with a few dozens of processors =-=[26, 37, 39, 46]-=-. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines [9, 14, 20, 41, 51]. We are convinced that the message–passing model of... |

8 |
An optimal hypercube direct n-body solver on the connection machine
- Brunet, Edelman, et al.
- 1990
(Show Context)
Citation Context ...his technique are discussed in [13, 17]. Force–decomposition methods which systolically cycle atom data around a ring or through a grid of processors have been used on MIMD [26, 49] and SIMD machines =-=[16, 57]-=-. Other force–decomposition methods that use the force–matrix formalism we discuss in Sections 3 and 4 have been presented in [12] and [15]. Boyer and Pawley [12] decompose the force matrix by sub–blo... |

7 |
Scalable parallel molecular dynamics on MIMD supercomputers
- Plimpton, Heffelfinger
- 1992
(Show Context)
Citation Context ...IMD) parallel machines with a few dozens of processors [26, 37, 39, 46]. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines =-=[9, 14, 20, 41, 51]-=-. We are convinced that the message–passing model of programming for MIMD machines is the only one that provides enough flexibility to implement all the data structure and computational enhancements t... |

6 | Parallelization of CHARMM for MIMD machines. Chemical Design Automation News - Brooks, Hodoscek - 1992 |

6 |
A domain decomposition parallelization strategy for molecular dynamics simulations on distributed memory machines
- Brown, Clarke, et al.
- 1993
(Show Context)
Citation Context ...IMD) parallel machines with a few dozens of processors [26, 37, 39, 46]. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines =-=[9, 14, 20, 41, 51]-=-. We are convinced that the message–passing model of programming for MIMD machines is the only one that provides enough flexibility to implement all the data structure and computational enhancements t... |

6 | Parallel computers and molecular simulation. Molec - Fincham - 1987 |

6 |
R.: Parallel multiple-time-step molecular dynamics with threebody interaction
- Nakano, Kalia
- 1993
(Show Context)
Citation Context ...ms has been for single–instruction/multiple–data (SIMD) parallel machines such as the CM–2 [12, 52], or for multiple–instruction/multiple–data (MIMD) parallel machines with a few dozens of processors =-=[26, 37, 39, 46]-=-. Recently there have been efforts to create scalable algorithms that work well on hundred– to thousand– processor MIMD machines [9, 14, 20, 41, 51]. We are convinced that the message–passing model of... |

6 |
Molecular dynamics simulations of short–range force systems on 1024–node hypercubes
- Plimpton
- 1990
(Show Context)
Citation Context ...the number of processors) so that as parallel machines become more powerful in the next few years, algorithms similar to it will enable larger problems to be studied. Our earlier efforts in this area =-=[40]-=- produced algorithms which were fast for systems with up to tens of thousands of atoms but did not scale optimally with N for larger systems. We improved on this effort to create a scalable large–syst... |

6 |
Parallel molecular dynamics with the Embedded Atom method
- Plimpton, Hendrickson
- 1993
(Show Context)
Citation Context ...e common in linear algebra algorithms for parallel machines [10, 28, 33] which sparked our interest in the idea, but to our knowledge we are the first to apply this idea to short–range MD simulations =-=[29, 43, 42]-=-. The assignment of sub–blocks of the force matrix to processors with a row–wise (calendar) ordering of the processors is depicted in Figure 5. We assume for ease of exposition that P is an even power... |

5 | Computing aspects of molecular dynamics simulations - Gupta - 1992 |

5 |
Parallel molecular dynamics of biomolecules
- Schreiber, Steinhauser, et al.
- 1992
(Show Context)
Citation Context ...an be avoided at the cost of more communication by using a modified force matrix G which references each pairwise interaction only once. There are several ways to do this by striping the force matrix =-=[48]-=-; we choose instead to form G as follows. Let Gij = Fij, except that Gij = 0 when i > j and i + j is even, and likewise Gij = 0 when i < j and i + j is odd. Conceptually, G is colored like a checkerbo... |

5 | A parallel scalable approach to short–range molecular dynamics on the CM–5
- Tamayo, Giles
- 1992
(Show Context)
Citation Context |

4 |
A special purpose computer for molecular dynamics calculations
- Bakker, Gilmer, et al.
- 1990
(Show Context)
Citation Context ...onsiderable effort has been expended by researchers to optimize MD calculations for vector supercomputers [24, 30, 36, 45, 47] and even to build special–purpose hardware for performing MD simulations =-=[4, 5]-=-. The current state–of–the–art is such that simulating ten– to hundred–thousand atom systems for picoseconds takes hours of CPU time on machines such as the Cray Y–MP. The fact that MD computations ar... |

4 | A high performance communication and memory caching scheme for molecular dynamics on the CM–5
- Beazley, Lomdahl, et al.
- 1994
(Show Context)
Citation Context |

3 |
Computational statistical mechanics: methodology, applications and supercomputing
- Abraham
- 1986
(Show Context)
Citation Context ...at http://www.cs.sandia.gov/∼sjplimp/main.html 11 Introduction Classical molecular dynamics (MD) is a commonly used computational tool for simulating the properties of liquids, solids, and molecules =-=[1, 2]-=-. Each of the N atoms or molecules in the simulation is treated as a point mass and Newton’s equations are integrated to compute their motion. From the motion of the ensemble of atoms a variety of use... |

3 |
A special purpose parallel computer for molecular dynamics: Motivation, design, implementation, and application
- Auerbach, Paul, et al.
- 1987
(Show Context)
Citation Context ...onsiderable effort has been expended by researchers to optimize MD calculations for vector supercomputers [24, 30, 36, 45, 47] and even to build special–purpose hardware for performing MD simulations =-=[4, 5]-=-. The current state–of–the–art is such that simulating ten– to hundred–thousand atom systems for picoseconds takes hours of CPU time on machines such as the Cray Y–MP. The fact that MD computations ar... |

3 |
Adhesion between atomically flat metallic surfaces”, Phys
- Taylor, Nelson, et al.
- 1991
(Show Context)
Citation Context ...ty of MD simulations are performed on systems of a few hundred to several thousand atoms where N is chosen to be as small as possible while still accurate enough to model the desired physical effects =-=[8, 44, 38, 53]-=-. The computational goal in these calculations is to perform each timestep as quickly as possible. This is particularly true in non–equilibrium MD where macroscopic changes in the system may take sign... |

3 |
A parallel treecode for gravitational N–body simulations with up to 20 million particles
- Warren, Salmon
- 1991
(Show Context)
Citation Context ... as f(M)N where M is the number of mesh points, hierarchical methods [6] which scale as N log(N), and fast–multipole methods [23] which scale as N. Recent parallel implementations of these algorithms =-=[19, 56]-=- have improved their range of applicability for many– body simulations, but because of their expense, long–range force models are not commonly used in classical MD simulations. By contrast, short–rang... |

2 |
Molecular dynamics simulation of a cyclic siloxane based liquid crystalline material
- Patnaik, Pachter, et al.
- 1993
(Show Context)
Citation Context ...ty of MD simulations are performed on systems of a few hundred to several thousand atoms where N is chosen to be as small as possible while still accurate enough to model the desired physical effects =-=[8, 44, 38, 53]-=-. The computational goal in these calculations is to perform each timestep as quickly as possible. This is particularly true in non–equilibrium MD where macroscopic changes in the system may take sign... |

1 | Atomic--scale simulation in materials science - Baskes, Daw, et al. - 1988 |

1 |
Molecular dynamics of clusters of particles interacting with pairwise forces using a massively parallel computer
- Boyer, Pawley
- 1988
(Show Context)
Citation Context ...allelism on various machines. The majority of the work that has included implementations of proposed algorithms has been for single–instruction/multiple–data (SIMD) parallel machines such as the CM–2 =-=[12, 52]-=-, or for multiple–instruction/multiple–data (MIMD) parallel machines with a few dozens of processors [26, 37, 39, 46]. Recently there have been efforts to create scalable algorithms that work well on ... |

1 | Hypercube alborithms for direct N–body solvers for different granularities - Brunet, Edelman, et al. - 1993 |

1 |
at Los Alamos National Labs, personal communication
- Lomdahl
- 1994
(Show Context)
Citation Context ...for a N = 1, 024, 000 atom system and 16.55 sec/timestep for a N = 65, 536, 000 atom system (both at a higher density of ρ∗ = 1.0) run on a 1024–node CM–5. (Their current timings are about 15% faster =-=[34]-=-). The latter run is at a rate of 28 Gflops, but a large fraction of these flops are computed on atoms outside the force cutoff and they count 35 flops/interaction. Their algorithm does not use neighb... |

1 |
Comparison of link–cell and neighbourhood tables on a range of computers
- Morales, Nuevo
- 1992
(Show Context)
Citation Context ...cessary to simulate even picoseconds of “real” time. Because of these computational demands, considerable effort has been expended by researchers to optimize MD calculations for vector supercomputers =-=[24, 30, 36, 45, 47]-=- and even to build special–purpose hardware for performing MD simulations [4, 5]. The current state–of–the–art is such that simulating ten– to hundred–thousand atom systems for picoseconds takes hours... |