Login / Signup

A GPU-Accelerated Fast Multipole Method for GROMACS: Performance and Accuracy.

Bartosz KohnkeCarsten KutznerHelmut Grubmüller
Published in: Journal of chemical theory and computation (2020)
An important and computationally demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Today, the prevalent method to compute these interactions is particle mesh Ewald (PME). The PME implementation in the GROMACS molecular dynamics package is extremely fast on individual GPU nodes. However, for large scale multinode parallel simulations, PME becomes the main scaling bottleneck as it requires all-to-all communication between the nodes; as a consequence, the number of exchanged messages scales quadratically with the number of involved nodes in that communication step. To enable efficient and scalable biomolecular simulations on future exascale supercomputers, clearly a method with a better scaling property is required. The fast multipole method (FMM) is such a method. As a first step on the path to exascale, we have implemented a performance-optimized, highly efficient GPU FMM and integrated it into GROMACS as an alternative to PME. For a fair performance comparison between FMM and PME, we first assessed the accuracies of the methods for various sets of input parameters. With parameters yielding similar accuracies for both methods, we determined the performance of GROMACS with FMM and compared it to PME for exemplary benchmark systems. We found that FMM with a multipole order of 8 yields electrostatic forces that are as accurate as PME with standard parameters. Further, for typical mixed-precision simulation settings, FMM does not lead to an increased energy drift with multipole orders of 8 or larger. Whereas an ≈50 000 atom simulation system with our FMM reaches only about a third of the performance with PME, for systems with large dimensions and inhomogeneous particle distribution, e.g., aerosol systems with water droplets floating in a vacuum, FMM substantially outperforms PME already on a single node.
Keyphrases