Login / Signup

Bridging the Gap between Differential Mobility, Log S , and Log P Using Machine Learning and SHAP Analysis.

Cailum M K StienstraChristian IeritanoAlexander HaackW Scott Hopkins
Published in: Analytical chemistry (2023)
Aqueous solubility, log S , and the water-octanol partition coefficient, log P , are physicochemical properties that are used to screen the viability of drug candidates and to estimate mass transport in the environment. In this work, differential mobility spectrometry (DMS) experiments performed in microsolvating environments are used to train machine learning (ML) frameworks that predict the log S and log P of various molecule classes. In lieu of a consistent source of experimentally measured log S and log P values, the OPERA package was used to evaluate the aqueous solubility and hydrophobicity of 333 analytes. With ion mobility/DMS data ( e.g. , CCS, dispersion curves) as input, we used ML regressors and ensemble stacking to derive relationships with a high degree of explainability, as assessed via SHapley Additive exPlanations (SHAP) analysis. The DMS-based regression models returned scores of R 2 = 0.67 and RMSE = 1.03 ± 0.10 for log S predictions and R 2 = 0.67 and RMSE = 1.20 ± 0.10 for log P after 5-fold random cross-validation. SHAP analysis reveals that the regressors strongly weighted gas-phase clustering in log P correlations. The addition of structural descriptors ( e.g., # of aromatic carbons) improved log S predictions to yield RMSE = 0.84 ± 0.07 and R 2 = 0.78. Similarly, log P predictions using the same data resulted in an RMSE of 0.83 ± 0.04 and R 2 = 0.84. The SHAP analysis of log P models highlights the need for additional experimental parameters describing hydrophobic interactions. These results were achieved with a smaller dataset (333 instances) and minimal structural correlation compared to purely structure-based models, underscoring the value of employing DMS data in predictive models.
Keyphrases
  • machine learning
  • big data
  • high throughput
  • ionic liquid
  • high resolution
  • artificial intelligence
  • contrast enhanced
  • diffusion weighted imaging