Surrogate Based Genetic Algorithm Method for Efficient Identification of Low-Energy Peptide Structures.
Justin VillardMurat KılıçUrsula RöthlisbergerPublished in: Journal of chemical theory and computation (2023)
Identification of the most stable structure(s) of a system is a prerequisite for the calculation of any of its properties from first-principles. However, even for relatively small molecules, exhaustive explorations of the potential energy surface (PES) are severely hampered by the dimensionality bottleneck. In this work, we address the challenging task of efficiently sampling realistic low-lying peptide coordinates by resorting to a surrogate based genetic algorithm (GA)/density functional theory (DFT) approach (sGADFT) in which promising candidates provided by the GA are ultimately optimized with DFT. We provide a benchmark of several computational methods (GAFF, AMOEBApro13, PM6, PM7, DFTB3-D3(BJ)) as possible prescanning surrogates and apply sGADFT to two test case systems that are (i) two isomer families of the protonated Gly-Pro-Gly-Gly tetrapeptide (Masson, A.; J. Am. Soc. Mass Spectrom. 2015, 26, 1444-1454) and (ii) the doubly protonated cyclic decapeptide gramicidin S (Nagornova, N. S.; J. Am. Chem. Soc. 2010, 132, 4040-4041). We show that our GA procedure can correctly identify low-energy minima in as little as a few hours. Subsequent refinement of surrogate low-energy structures within a given energy threshold (≤10 kcal/mol (i), ≤5 kcal/mol (ii)) via DFT relaxation invariably led to the identification of the most stable structures as determined from high-resolution infrared (IR) spectroscopy at low temperature. The sGADFT method therefore constitutes a highly efficient route for the screening of realistic low-lying peptide structures in the gas phase as needed for instance for the interpretation and assignment of experimental IR spectra.
Keyphrases
- density functional theory
- high resolution
- pet ct
- highly efficient
- molecular dynamics
- particulate matter
- air pollution
- machine learning
- bioinformatics analysis
- mass spectrometry
- deep learning
- genome wide
- heavy metals
- gene expression
- single molecule
- polycyclic aromatic hydrocarbons
- high speed
- risk assessment
- climate change
- water soluble
- dna methylation
- solid state