RealVS: Toward Enhancing the Precision of Top Hits in Ligand-Based Virtual Screening of Drug Leads from Large Compound Databases.
Yueming YinHaifeng HuZhen YangHuajian XuJiansheng WuPublished in: Journal of chemical information and modeling (2021)
Accurate modeling of compound bioactivities is essential for the virtual screening of drug leads. In real-world scenarios, pharmacists tend to choose from the top-k hit compounds ranked by predicted bioactivities from a large database with interest to continue wet experiments for drug discovery. Significant improvement of the precision of the top hits in ligand-based virtual screening of drug leads is more valuable than conventional schemes for accurately predicting the bioactivities of all compounds from a large database. Here, we proposed a new method, RealVS, to significantly improve the top hits' precision and learn interpretable key substructures associated with compound bioactivities. The features of RealVS involve the following points. (1) Abundant transferable information from the source domain was introduced for alleviating the insufficiency of inactive ligands associated with drug targets. (2) The adversarial domain alignment was adopted to fit the distribution of generated features of compounds from the training data set and that from the screening database for greater model generalization ability. (3) A novel objective function was proposed to simultaneously optimize the classification loss, regression loss, and adversarial loss, where most inactive ligands tend to be screened out before activity regression prediction. (4) Graph attention networks were adopted for learning key substructures associated with ligand bioactivities for better model interpretability. The results on a large number of benchmark data sets show that our method has significantly improved the precision of top hits under various k values in ligand-based virtual screening of drug leads from large compound databases, which is of great value in real-world scenarios. The web server of RealVS is freely available at noveldelta.com/RealVS for academic purposes, where virtual screening of hits from large compound databases is accessible.