Purging putative siblings from population genetic data sets: a cautionary view.
Robin S WaplesEric C AndersonPublished in: Molecular ecology (2017)
(i) unless simulated samples included large family groups together with a component of unrelated individuals, removing siblings generally reduced precision of P^ and F^ST; (ii) N^e based on the linkage disequilibrium method was largely unbiased using full random samples but became increasingly upwardly biased under aggressive purging of siblings. Under nonrandom sampling (some families over-represented), N^e using full samples was downwardly biased; removing just the right 'Goldilocks' fraction of siblings could produce an unbiased estimate, but this sweet spot varied widely among scenarios; (iii) weighting individuals based on the inferred pedigree (to produce a best linear unbiased estimator, BLUE) maximized precision of P^ when the inferred pedigree was correct but performed poorly when the pedigree was wrong; (iv) a variant of sibling removal that leaves intact small sibling groups appears to be more robust to errors in inferences about family structure. Our results illustrate the complex challenges posed by presence of family structure, suggest that no single optimal solution exists and argue for caution in adjusting population genetic data sets for the presence of putative siblings without fully understanding the consequences.