PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer.
Harbil BediagaSonia ArrasateHumberto González-DíazPublished in: ACS combinatorial science (2018)
Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c j). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c j formed from >70 different biological activity parameters ( c0), >300 different drug targets ( c1), >230 cell lines ( c2), and 5 organisms of assay ( c3) or organisms of the target ( c4). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.
Keyphrases
- high throughput
- big data
- prostate cancer
- machine learning
- systematic review
- artificial intelligence
- stem cells
- type diabetes
- emergency department
- neural network
- adipose tissue
- deep learning
- oxidative stress
- cell therapy
- insulin resistance
- bone marrow
- radical prostatectomy
- single cell
- young adults
- glycemic control
- gram negative
- drug discovery
- virtual reality