Fast and accurate joint inference of coancestry parameters for populations and/or individuals.
Tristan Mary-HuardDavid BaldingPublished in: PLoS genetics (2023)
We introduce a fast, new algorithm for inferring from allele count data the FST parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to FST values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining FST, via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of FST values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.