Login / Signup

Improving Accuracy and Transferability of Machine Learning Chemical Activation Energies by Adding Electronic Structure Information.

Esteban A MarquesStefan de GendtGeoffrey PourtoisMichiel J van Setten
Published in: Journal of chemical information and modeling (2023)
Predicting chemical activation energies is one of the longstanding and important challenges in computational chemistry. Recent advances have shown that machine learning can be used to create tools to predict them. Such tools can significantly decrease the computational cost for these predictions compared to traditional methods, which require an optimal path search along a high-dimensional potential energy surface. To enable this new route, we need both large and accurate datasets and a compact yet complete description of the reactions. Although data for chemical reactions is becoming increasingly available, the key step of encoding the reaction as an efficient descriptor remains a big challenge. In this paper, we demonstrate that including electronic energy levels in the description of the reaction significantly improves the prediction accuracy and transferability. Feature importance analysis further demonstrates that electronic energy levels have a higher importance than some structural information and typically require less space in the reaction encoding vector. In general, we observe that the results of the feature importance analysis relate well to the domain knowledge of fundamental chemical principles. This work can help to build better chemical reaction encodings for machine learning and thus improve the predictions of machine learning models for reaction activation energies. These models could ultimately be used to recognize reaction limiting steps in large reaction systems, allowing to account for bottlenecks at the design stage.
Keyphrases
  • machine learning
  • big data
  • artificial intelligence
  • deep learning
  • healthcare
  • density functional theory
  • electronic health record
  • climate change
  • molecular dynamics
  • rna seq
  • low cost