Accurate Prediction of Rat Acute Oral Toxicity and Reference Dose for Thousands of Polycyclic Aromatic Hydrocarbon Derivatives Based on Chemometric QSAR and Machine Learning.
Shuang WuShi-Xin LiJing QiuHai Ming ZhaoYan-Wen LiNai-Xian FengBai-Lin LiuQuan-Ying CaiLei XiangCe-Hui MoQing X LiPublished in: Environmental science & technology (2024)
Acute oral toxicity is currently not available for most polycyclic aromatic hydrocarbons (PAHs), especially their derivatives, because it is cost-prohibitive to experimentally determine all of them. Here, quantitative structure-activity relationship (QSAR) models using machine learning (ML) for predicting the toxicity of PAH derivatives were developed, based on oral toxicity data points of 788 individual substances of rats. Both the individual ML algorithm gradient boosting regression trees (GBRT) and the stacking ML algorithm (extreme gradient boosting + GBRT + random forest regression) provided the best prediction results with satisfactory determination coefficients for both cross-validation and the test set. It was found that those PAH derivatives with fewer polar hydrogens, more large-sized atoms, more branches, and lower polarizability have higher toxicity. Software based on the optimal ML-QSAR model was successfully developed to expand the application potential of the developed model, obtaining reliable prediction of pLD 50 values and reference doses for 6893 external PAH derivatives. Among these chemicals, 472 were identified as moderately or highly toxic; 10 out of them had clear environment detection or use records. The findings provide valuable insights into the toxicity of PAHs and their derivatives, offering a standard platform for effectively evaluating chemical toxicity using ML-QSAR models.
Keyphrases
- structure activity relationship
- polycyclic aromatic hydrocarbons
- oxidative stress
- machine learning
- molecular docking
- molecular dynamics
- liver failure
- climate change
- deep learning
- high resolution
- heavy metals
- electronic health record
- human health
- respiratory failure
- drinking water
- molecular dynamics simulations
- simultaneous determination
- solid phase extraction