Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning.
Tianshi YuTianyang HuangLeiye YuChanin NantasenamatNuttapat AnuwongcharoenTheeraphon PiachamRuobing RenYing-Chih ChiangPublished in: Molecules (Basel, Switzerland) (2023)
Cytochrome P450 17A1 (CYP17A1) is one of the key enzymes in steroidogenesis that produces dehydroepiandrosterone (DHEA) from cholesterol. Abnormal DHEA production may lead to the progression of severe diseases, such as prostatic and breast cancers. Thus, CYP17A1 is a druggable target for anti-cancer molecule development. In this study, cheminformatic analyses and quantitative structure-activity relationship (QSAR) modeling were applied on a set of 962 CYP17A1 inhibitors (i.e., consisting of 279 steroidal and 683 nonsteroidal inhibitors) compiled from the ChEMBL database. For steroidal inhibitors, a QSAR classification model built using the PubChem fingerprint along with the extra trees algorithm achieved the best performance, reflected by the accuracy values of 0.933, 0.818, and 0.833 for the training, cross-validation, and test sets, respectively. For nonsteroidal inhibitors, a systematic cheminformatic analysis was applied for exploring the chemical space, Murcko scaffolds, and structure-activity relationships (SARs) for visualizing distributions, patterns, and representative scaffolds for drug discoveries. Furthermore, seven total QSAR classification models were established based on the nonsteroidal scaffolds, and two activity cliff (AC) generators were identified. The best performing model out of these seven was model VIII, which is built upon the PubChem fingerprint along with the random forest algorithm. It achieved a robust accuracy across the training set, the cross-validation set, and the test set, i.e., 0.96, 0.92, and 0.913, respectively. It is anticipated that the results presented herein would be instrumental for further CYP17A1 inhibitor drug discovery efforts.
Keyphrases
- machine learning
- structure activity relationship
- deep learning
- molecular docking
- drug discovery
- molecular dynamics
- artificial intelligence
- tissue engineering
- emergency department
- big data
- mass spectrometry
- prostate cancer
- early onset
- virtual reality
- molecular dynamics simulations
- cross sectional
- quality improvement
- single molecule
- benign prostatic hyperplasia
- low density lipoprotein