Adjusting for principal components can induce spurious associations in genome-wide association studies in admixed populations.
Kelsey E GrindeBrian L BrowningAlexander P ReinerTimothy A ThorntonSharon R BrowningPublished in: bioRxiv : the preprint server for biology (2024)
Principal component analysis (PCA) is a widely used technique in human genetics research. One of its most frequent applications is in the context of genetic association studies, wherein researchers use PCA to infer, and then adjust for, the genetic ancestry of study participants. Although a powerful approach, prior work has shown that PCA sometimes captures other features or data quality issues, and pre-processing steps have been suggested to address these concerns. However, the utility and downstream implications of this recommended preprocessing are not fully understood, nor are these steps universally implemented. Moreover, the vast majority of prior work in this area was conducted in studies that exclusively included individuals of European ancestry. Here, we revisit this work in the context of admixed populations-populations with diverse, mixed ancestry that have been largely underrepresented in genetics research to date. We demonstrate the unique concerns that can arise in this context and illustrate the detrimental effects that including principal components in genetic association study models can have when not implemented carefully. Altogether, we hope our work serves as a reminder of the care that must be taken-including careful pre-processing, diagnostics, and modeling choices-when implementing PCA in admixed populations and beyond.