Login / Signup

Phantom oscillations in principal component analysis.

Maxwell Shinn
Published in: Proceedings of the National Academy of Sciences of the United States of America (2023)
Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.
Keyphrases
  • data analysis
  • electronic health record
  • big data
  • working memory
  • high frequency
  • magnetic resonance imaging
  • computed tomography
  • image quality
  • deep learning