In silico prediction of ocular toxicity of compounds using explainable machine learning and deep learning approaches.
Yiqing ZhouZe WangZejun HuangWei-Hua LiYuanting ChenXinxin YuYun TangGuixia LiuPublished in: Journal of applied toxicology : JAT (2024)
The accurate identification of chemicals with ocular toxicity is of paramount importance in health hazard assessment. In contemporary chemical toxicology, there is a growing emphasis on refining, reducing, and replacing animal testing in safety evaluations. Therefore, the development of robust computational tools is crucial for regulatory applications. The performance of predictive models is heavily reliant on the quality and quantity of data. In this investigation, we amalgamated the most extensive dataset (4901 compounds) sourced from governmental GHS-compliant databases and literature to develop binary classification models of chemical ocular toxicity. We employed 12 molecular representations in conjunction with six machine learning algorithms and two deep learning algorithms to create a series of binary classification models. The findings indicated that the deep learning method GCN outperformed the machine learning models in cross-validation, achieving an impressive AUC of 0.915. However, the top-performing machine learning model (RF-Descriptor) demonstrated excellent performance with an AUC of 0.869 on the test set and was therefore selected as the best model. To enhance model interpretability, we conducted the SHAP method and attention weights analysis. The two approaches offered visual depictions of the relevance of key descriptors and substructures in predicting ocular toxicity of chemicals. Thus, we successfully struck a delicate balance between data quality and model interpretability, rendering our model valuable for predicting and comprehending potential ocular-toxic compounds in the early stages of drug discovery.
Keyphrases
- machine learning
- deep learning
- big data
- artificial intelligence
- convolutional neural network
- oxidative stress
- healthcare
- drug discovery
- systematic review
- public health
- working memory
- high resolution
- transcription factor
- mental health
- optic nerve
- health information
- data analysis
- risk assessment
- molecular dynamics simulations