Login / Signup

A geometric relationship of F 2 , F 3 and F 4 -statistics with principal component analysis.

Benjamin Marco Peter
Published in: Philosophical transactions of the Royal Society of London. Series B, Biological sciences (2022)
Principal component analysis (PCA) and F -statistics sensu Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related. F -statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an F 3 -statistic will lie inside a circle on a PCA plot. Furthermore, the F 4 -statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most F -statistics, and that PCA plots are effective at predicting F -statistics. Thus, while F -statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Keyphrases
  • endothelial cells
  • induced pluripotent stem cells
  • pluripotent stem cells
  • high resolution
  • genetic diversity
  • south africa
  • genome wide
  • heavy metals
  • air pollution
  • health risk
  • copy number
  • health risk assessment