Login / Signup

Predicting Solubility of Newly-Approved Drugs (2016-2020) with a Simple ABSOLV and GSE( Flexible-Acceptor ) Consensus Model Outperforming Random Forest Regression.

Alex AvdeefManfred Kansy
Published in: Journal of solution chemistry (2022)
This study applies the 'Flexible-Acceptor' variant of the General Solubility Equation, GSE( Φ,B ), to the prediction of the aqueous intrinsic solubility, log 10 S 0 , of FDA recently-approved (2016-2020) 'small-molecule' new molecular entities (NMEs). The novel equation had been shown to predict the solubility of drugs beyond Lipinski's 'Rule of 5' chemical space (bRo5) to a precision nearly matching that of the Random Forest Regression (RFR) machine learning method. Since then, it was found that the GSE( Φ,B ) appears to work well not only for bRo5 NMEs, but also for Ro5 drugs. To put context to GSE( Φ,B ), Yalkowsky's GSE(classic), Abraham's ABSOLV, and Breiman's RFR models were also applied to predict log 10   S 0 of 72 newly-approve NMEs, for which useable reported solubility values could be accessed (nearly 60% from FDA New Drug Application published reports). Except for GSE (classic), the prediction models were retrained with an enlarged version of the Wiki- pS 0 database (nearly 400 added log 10   S 0 entries since our recent previous study). Thus, these four models were further validated by the additional independent solubility measurements which the newly-approved drugs introduced. The prediction methods ranked RFR ~ GSE ( Φ,B ) > ABSOLV > GSE (classic) in performance. It was further demonstrated that the biases generated in the four separate models could be nearly eliminated in a consensus model based on the average of just two of the methods: GSE ( Φ,B ) and ABSOLV. The resulting consensus prediction equation is simple in form and can be easily incorporated into spreadsheet calculations. Even more significant, it slightly outperformed the RFR method.
Keyphrases