Login / Signup

Classification of biomass reactions and predictions of reaction energies through machine learning.

Chaoyi ChangAndrew James Medford
Published in: The Journal of chemical physics (2021)
Elementary steps and intermediate species of linearly structured biomass compounds are studied. Specifically, possible intermediates and elementary reactions of 15 key biomass compounds and 33 small molecules are obtained from a recursive bond-breaking algorithm. These are used as inputs to the unsupervised Mol2Vec algorithm to generate vector representations of all intermediates and elementary reactions. The vector descriptors are used to identify sub-classes of elementary steps, and linear discriminant analysis is used to accurately identify the reaction type and reduce the dimension of the vectors. The resulting descriptors are applied to predict gas-phase reaction energies using linear regression with accuracies that exceed the commonly employed group additivity approach. They are also applied to quantitatively assess model compound similarity, and the results are consistent with chemical intuition. This workflow for creating vector representations of complex molecular systems requires no input from electronic structure calculations, and it is expected to be applicable to other similar systems where vector representations are needed.
Keyphrases
  • machine learning
  • working memory
  • deep learning
  • density functional theory
  • wastewater treatment
  • artificial intelligence
  • big data
  • anaerobic digestion
  • molecular dynamics
  • electron transfer
  • single molecule