CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting.
Qiang LiYu HeJianbo PanPublished in: Briefings in bioinformatics (2024)
In the drug development process, approximately 30% of failures are attributed to drug safety issues. In particular, the first-in-human (FIH) trial of a new drug represents one of the highest safety risks, and initial dose selection is crucial for ensuring safety in clinical trials. With traditional dose estimation methods, which extrapolate data from animals to humans, catastrophic events have occurred during Phase I clinical trials due to interspecies differences in compound sensitivity and unknown molecular mechanisms. To address this issue, this study proposes a CrossFuse-extreme gradient boosting (XGBoost) method that can directly predict the maximum recommended daily dose of a compound based on existing human research data, providing a reference for FIH dose selection. This method not only integrates multiple features, including molecular representations, physicochemical properties and compound-protein interactions, but also improves feature selection based on cross-validation. The results demonstrate that the CrossFuse-XGBoost method not only improves prediction accuracy compared to that of existing local weighted methods [k-nearest neighbor (k-NN) and variable k-NN (v-NN)] but also solves the low prediction coverage issue of v-NN, achieving full coverage of the external validation set and enabling more reliable predictions. Furthermore, this study offers a high level of interpretability by identifying the importance of different features in model construction. The 241 features with the most significant impact on the maximum recommended daily dose were selected, providing references for optimizing the structure of new compounds and guiding experimental research. The datasets and source code are freely available at https://github.com/cqmu-lq/CrossFuse-XGBoost.
Keyphrases
- clinical trial
- endothelial cells
- physical activity
- machine learning
- climate change
- emergency department
- electronic health record
- magnetic resonance
- healthcare
- deep learning
- randomized controlled trial
- magnetic resonance imaging
- mass spectrometry
- phase ii
- working memory
- affordable care act
- high resolution
- data analysis
- pluripotent stem cells
- adverse drug
- health insurance
- double blind
- placebo controlled
- high density
- drug induced