A unified machine-learning protocol for asymmetric catalysis as a proof of concept demonstration using asymmetric hydrogenation.

Sukriti SinghMonika PareekAvtar ChangotraSayan BanerjeeBangaru BhaskararaoP BalamuruganRaghavan B Sunoj

Published in: Proceedings of the National Academy of Sciences of the United States of America (2020)

Design of asymmetric catalysts generally involves time- and resource-intensive heuristic endeavors. In view of the steady increase in interest toward efficient catalytic asymmetric reactions and the rapid growth in the field of machine learning (ML) in recent years, we envisaged dovetailing these two important domains. We selected a set of quantum chemically derived molecular descriptors from five different asymmetric binaphthyl-derived catalyst families with the propensity to impact the enantioselectivity of asymmetric hydrogenation of alkenes and imines. The predictive power of the random forest (RF) built using the molecular parameters of a set of 368 substrate-catalyst combinations is found to be impressive, with a root-mean-square error (rmse) in the predicted enantiomeric excess (%ee) of about 8.4 ± 1.8 compared to the experimentally known values. The accuracy of RF is found to be superior to other ML methods such as convolutional neural network, decision tree, and eXtreme gradient boosting as well as stepwise linear regression. The proposed method is expected to provide a leap forward in the design of catalysts for asymmetric transformations.

Keyphrases