Identification of active molecules against Mycobacterium tuberculosis through machine learning.
Qing YeXin ChaiDejun JiangLiu YangChao ShenXujun ZhangDan LiDong-Sheng CaoTing-Jun HouPublished in: Briefings in bioinformatics (2021)
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.
Keyphrases
- mycobacterium tuberculosis
- machine learning
- drug resistant
- pulmonary tuberculosis
- multidrug resistant
- deep learning
- neural network
- artificial intelligence
- acinetobacter baumannii
- clinical practice
- infectious diseases
- big data
- pseudomonas aeruginosa
- emergency department
- squamous cell carcinoma
- human immunodeficiency virus
- drug induced
- hiv infected
- bioinformatics analysis