Exploring Alternative Strategies for the Identification of Potent Compounds Using Support Vector Machine and Regression Modeling.
Tomoyuki MiyaoKimito FunatsuJürgen BajorathPublished in: Journal of chemical information and modeling (2018)
Support vector regression (SVR) is a premier approach for the prediction of compound potency. Given the conceptual link between support vector machine (SVM) and SVR modeling, SVR is capable of accounting for continuous and discontinuous structure-activity relationships (SARs) in potency prediction, which further extends the classical quantitative SAR (QSAR) paradigm. In the context of virtual compound screening, compound potency prediction can be applied to identify the most potent compounds that are available or enrich database selection sets with potent compounds. To these ends, we have evaluated new potency prediction strategies. Conventional (direct) potency prediction using SVR was compared to two-stage SVM-SVR modeling and potency prediction using SVR models trained in the presence of active and inactive compounds, a previously unconsidered approach. The latter models were found to maximize the recall of potent compounds but were least accurate in predicting high potency values. For this purpose, direct SVR predictions were preferred. However, the best balance between accurate potency predictions and enrichment of potent compounds in database selection sets was achieved by combined SVM-SVR modeling. Taken together, our findings further extend current approaches for compound potency prediction in virtual compound screening.