Machine Learning-Assisted Prediction of the Biological Activity of Aromatase Inhibitors and Data Mining to Explore Similar Compounds.

Muhammad IshfaqMuhammad AamirFarooq AhmadAbdelazim M MebedSayed Elshahat

Published in: ACS omega (2022)

Designing molecules for drugs has been a hot topic for many decades. However, it is hard and expensive to find a new molecule. Thus, the cost of the final drug is also increased. Machine learning can provide the fastest way to predict the biological activity of druglike molecules. In the present work, machine learning models are trained for the prediction of the biological activity of aromatase inhibitors. Data was collected from the literature. Molecular descriptors are calculated to be used as independent features for model training. The results showed that the R 2 values for linear regression, random forest regression, gradient boosting regression, and bagging regression are 0.58, 0.84, 0.77, and 0.80, respectively. Using these models, it is possible to predict the activity of new molecules in a short period of time and at a reasonable cost. Furthermore, Tanimoto similarity is used for similarity analysis, as well as a chemical database is mined to search for similar molecules. Nonetheless, this study provides a framework for repurposing other effective drug molecules to prevent cancer.

Keyphrases