Login / Signup

Apples to apples comparison of standardized to unstandardized principal component analysis of methods that assign partial atomic charges in molecules.

Thomas A Manz
Published in: RSC advances (2022)
Articles by Cho et al. ( ChemPhysChem , 2020, 21 , 688-696) and Manz ( RSC Adv. , 2020, 10 , 44121-44148) performed unstandardized and standardized, respectively, principal component analysis (PCA) to study atomic charge assignment methods for molecular systems. Both articles used subsets of atomic charges computed by Cho et al. ; however, the data subsets employed were not strictly identical. Herein, an element by element analysis of this dataset is first performed to compare the spread of charge values across individual chemical elements and charge assignment methods. This reveals an underlying problem with the reported Becke partial atomic charges in this dataset. Due to their unphysical values, these Becke charges were not included in the subsequent PCA. Standardized and unstandardized PCA are performed across two datasets: (i) 19 charge assignment methods having a complete basis set limit and (ii) all 25 charge assignment methods (excluding Becke) for which Cho et al. computed atomic charges. The dataset contained ∼2000 molecules having a total of 29 907 atoms in materials. The following five methods (listed here in alphabetical order) showed the greatest correlation to the first principal component in standardized and unstandardized PCA: DDEC6, Hirshfeld-I, ISA, MBIS, and MBSBickelhaupt (note: MBSBickelhaupt does not appear in the 19 methods dataset). For standardized PCA, the DDEC6 method ranked first followed closely by MBIS. For unstandardized PCA, Hirshfeld-I (19 methods) or MBSBickelhaupt (25 methods) ranked first followed by DDEC6 in second place (both 19 and 25 methods).
Keyphrases
  • machine learning
  • computed tomography
  • magnetic resonance
  • electronic health record
  • deep learning
  • crystal structure
  • single cell