Chemistrees: Data-Driven Identification of Reaction Pathways via Machine Learning.
Sander RoetChristopher David DaubEnrico RiccardiPublished in: Journal of chemical theory and computation (2021)
We propose to analyze molecular dynamics (MD) output via a supervised machine learning (ML) algorithm, the decision tree. The approach aims to identify the predominant geometric features which correlate with trajectories that transition between two arbitrarily defined states. The data-driven algorithm aims to identify these features without the bias of human "chemical intuition". We demonstrate the method by analyzing the proton exchange reactions in formic acid solvated in small water clusters. The simulations were performed with ab initio MD combined with a method to efficiently sample the rare event, path sampling. Our ML analysis identified relevant geometric variables involved in the proton transfer reaction and how they may change as the number of solvating water molecules changes.