Login / Signup

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties.

Qi YangYidi LiuJunjie ChengYao LiSiyuan LiuYingdong DuanLong ZhangSanzhong Luo
Published in: Chemphyschem : a European journal of chemical physics and physical chemistry (2022)
Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well-structured chemical databases with machine learning tasks varying from regression to classifications.
Keyphrases
  • machine learning
  • working memory
  • big data
  • artificial intelligence
  • single molecule
  • neural network
  • physical activity
  • mental health
  • single cell
  • upper limb