PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity.
Ammar AmmarRachel CavillChris T A EveloEgon L WilighagenPublished in: Journal of cheminformatics (2023)
Protein mutations, especially those which occur in the binding site, play an important role in inter-individual drug response and may alter binding affinity and thus impact the drug's efficacy and side effects. Unfortunately, large-scale experimental screening of ligand-binding against protein variants is still time-consuming and expensive. Alternatively, in silico approaches can play a role in guiding those experiments. Methods ranging from computationally cheaper machine learning (ML) to the more expensive molecular dynamics have been applied to accurately predict the mutation effects. However, these effects have been mostly studied on limited and small datasets, while ideally a large dataset of binding affinity changes due to binding site mutations is needed. In this work, we used the PSnpBind database with six hundred thousand docking experiments to train a machine learning model predicting protein-ligand binding affinity for both wild-type proteins and their variants with a single-point mutation in the binding site. A numerical representation of the protein, binding site, mutation, and ligand information was encoded using 256 features, half of them were manually selected based on domain knowledge. A machine learning approach composed of two regression models is proposed, the first predicting wild-type protein-ligand binding affinity while the second predicting the mutated protein-ligand binding affinity. The best performing models reported an RMSE value within 0.5 [Formula: see text] 0.6 kcal/mol -1 on an independent test set with an R 2 value of 0.87 [Formula: see text] 0.90. We report an improvement in the prediction performance compared to several reported models developed for protein-ligand binding affinity prediction. The obtained models can be used as a complementary method in early-stage drug discovery. They can be applied to rapidly obtain a better overview of the ligand binding affinity changes across protein variants carried by people in the population and narrow down the search space where more time-demanding methods can be used to identify potential leads that achieve a better affinity for all protein variants.
Keyphrases
- machine learning
- protein protein
- molecular dynamics
- early stage
- binding protein
- healthcare
- small molecule
- copy number
- emergency department
- wild type
- dna methylation
- capillary electrophoresis
- drug discovery
- gene expression
- density functional theory
- big data
- preterm infants
- adverse drug
- lymph node
- social media
- genome wide
- rectal cancer
- rna seq
- sentinel lymph node
- high resolution
- high speed