Login / Signup

Beyond principal components: a critical comparison of factor analysis methods for subspace modelling in chemistry.

Peter D WentzellCannon GiglioMohsen Kompany-Zareh
Published in: Analytical methods : advancing methods and applications (2021)
Multivariate data analysis tools have become an integral part of modern analytical chemistry, and principal component analysis (PCA) is perhaps foremost among these. PCA is central in approaching many problems in data exploration, classification, calibration, modelling, and curve resolution. However, PCA is only one form of a broader group of factor analysis (FA) methods that are rarely employed by chemists. The dominance of PCA in chemistry is primarily a consequence of history and convenience, but this has obscured the potential advantages of other FA tools that are widely used in other fields. The purpose of this article, which is intended for those who are already familiar with the mathematical foundations and applications of PCA, is to develop a framework to relate PCA to other commonly used FA methods from the perspective of chemical applications. Specifically, PCA is compared to maximum likelihood factor analysis (MLFA), principal axis factorization (PAF) and maximum likelihood PCA (MLPCA). Similarities and differences are highlighted with regard to the assumptions and constraints of the models, algorithms employed, and calculation of scores and loadings. Practical aspects such as data dimensionality, preprocessing, rank estimation, improper solutions (Heywood cases), and software implementation are considered. The performance of the four methods is compared using both simulated and experimental data sets. While PCA provides the most reliable estimates when measurement error variance is uniform (homoscedastic noise) and MLPCA works best when the error covariance matrix is explicitly known, MLFA and PAF have the distinct advantage of providing information about measurement uncertainty and adapting to situations of unknown heteroscedastic errors, eliminating the need for scaling. Moreover, MLFA in particular is shown to be tolerant to deviations from model linearity. These results make a strong case for increased application of other FA methods in chemistry.
Keyphrases
  • data analysis
  • machine learning
  • healthcare
  • big data
  • primary care
  • mental health
  • deep learning
  • emergency department
  • drug discovery
  • social media
  • artificial intelligence
  • low cost
  • human health