Login / Signup

Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences.

Sereina Riniker
Published in: Journal of chemical information and modeling (2017)
While the use of machine-learning (ML) techniques is well established in cheminformatics for the prediction of physicochemical properties and binding affinities, the training of ML models based on data from molecular dynamics (MD) simulations remains largely unexplored. Here, we present a fingerprint termed MDFP which is constructed from the distributions of properties such as potential-energy components, radius of gyration, and solvent-accessible surface area extracted from MD simulations. The corresponding fingerprint elements are the first two statistical moments of the distributions and the median. By considering not only the average but also the spread of the distribution in the fingerprint, some degree of entropic information is encoded. Short MD simulations of the molecules in water (and in vacuum) are used to generate MDFP. These are further combined with simple counts based on the 2D structure of the molecules into MDFP+. The resulting information-rich MDFP+ is used to train ML models for the prediction of solvation free energies in five different solvents (water, octanol, chloroform, hexadecane, and cyclohexane) as well as partition coefficients in octanol/water, hexadecane/water, and cyclohexane/water. The approach is easy to implement and computationally relatively inexpensive. Yet, it performs similarly well compared to more rigorous MD-based free-energy methods such as free-energy perturbation (FEP) as well as end-state methods such as linear interaction energy (LIE), the conductor-like screening model for realistic solvation (COSMO-RS), and the SMx family of solvation models.
Keyphrases
  • molecular dynamics
  • density functional theory
  • machine learning
  • big data
  • artificial intelligence
  • health information
  • deep learning
  • climate change
  • social media
  • monte carlo
  • data analysis
  • perovskite solar cells