An inductive transfer learning force field (ITLFF) protocol builds protein force fields in seconds.
Yanqiang HanZhilong WangAn ChenImran AliJunfei CaiSimin YeJin-Jin LiPublished in: Briefings in bioinformatics (2022)
Accurate simulation of protein folding is a unique challenge in understanding the physical process of protein folding, with important implications for protein design and drug discovery. Molecular dynamics simulation strongly requires advanced force fields with high accuracy to achieve correct folding. However, the current force fields are inaccurate, inapplicable and inefficient. We propose a machine learning protocol, the inductive transfer learning force field (ITLFF), to construct protein force fields in seconds with any level of accuracy from a small dataset. This process is achieved by incorporating an inductive transfer learning algorithm into deep neural networks, which learn knowledge of any high-level calculations from a large dataset of low-level method. Here, we use a double-hybrid density functional theory (DFT) as a case functional, but ITLFF is suitable for any high-precision functional. The performance of the selected 18 proteins indicates that compared with the fragment-based double-hybrid DFT algorithm, the force field constructed by ITLFF achieves considerable accuracy with a mean absolute error of 0.0039 kcal/mol/atom for energy and a root mean square error of 2.57 $\mathrm{kcal}/\mathrm{mol}/{\AA}$ for force, and it is more than 30 000 times faster and obtains more significant efficiency benefits as the system increases. The outstanding performance of ITLFF provides promising prospects for accurate and efficient protein dynamic simulations and makes an important step toward protein folding simulation. Due to the ability of ITLFF to utilize the knowledge acquired in one task to solve related problems, it is also applicable for various problems in biology, chemistry and material science.
Keyphrases
- single molecule
- molecular dynamics simulations
- density functional theory
- machine learning
- protein protein
- mental health
- amino acid
- binding protein
- molecular docking
- healthcare
- neural network
- drug discovery
- randomized controlled trial
- public health
- physical activity
- small molecule
- artificial intelligence
- mass spectrometry
- virtual reality
- crystal structure