Regularized regression analysis of digitized molecular structures in organic reactions for quantification of steric effects.
Shigeru YamaguchiTakahiro NishimuraYuta HibeMasaki NagaiHirofumi SatoIan JohnstonPublished in: Journal of computational chemistry (2017)
In organic chemistry, Comparative Molecular Field Analysis (CoMFA) can be defined as a regression analysis between reaction outcomes and molecular fields, wherein we can extract and visualize important structural information from the coefficients of the constructed regression models. In CoMFA, partial least-squares (PLS) regression, which determines all coefficients in the model, is used for fitting the regression models. However, in organic reactions, steric effects are observed only near the reactive site, indicating that a large number of regression coefficients in the CoMFA of organic reactions should be assigned as 0. The regularized regression method, LASSO/Elastic Net, allows us to fit the regression model while assigning 0 values to unimportant coefficients. Although LASSO/Elastic Net should be suitable for CoMFA, there is no example of its use for organic reaction analysis. Herein, we examine the performance of LASSO/Elastic Net for the quantification of steric effects in CoMFA. We employ digitized molecular structures (the indicator field) as molecular fields that represent steric effects. LASSO/Elastic Net regressions provide highly interpretable models that include less noise than those from PLS regression. © 2017 Wiley Periodicals, Inc.