Login / Signup

CORRECTING MODEL MISSPECIFICATION IN RELATIONSHIP ESTIMATES.

Ethan M Jewettnull null
Published in: bioRxiv : the preprint server for biology (2024)
The datasets of large genotyping biobanks and direct-to-consumer genetic testing companies contain many related individuals. Until now, it has been widely accepted that the most distant relationships that can be detected are around fifteen degrees (approximately 8 th cousins) and that practical relationship estimates have a ceiling around ten degrees (approximately 5 th cousins). However, we show that these assumptions are incorrect and that they are due to a misapplication of relationship estimators. In particular, relationship estimators are applied almost exclusively to putative relatives who have been identified because they share detectable tracts of DNA identically by descent (IBD). However, no existing relationship estimator conditions on the event that two individuals share at least one detectable segment of IBD anywhere in the genome. As a result, the relationship estimates obtained using existing estimators are dramatically biased for distant relationships, inferring all sufficiently distant relationships to be around ten degrees regardless of the depth of the true relationship. Moreover, existing relationship estimators are derived under a model that assumes that each pair of related individuals shares a single common ancestor (or mating pair of ancestors). This model breaks down for relationships beyond 10 generations in the past because individuals share many thousands of cryptic common ancestors due to pedigree collapse. We first derive a corrected likelihood that conditions on the event that at least one segment is observed between a pair of putative relatives and we demonstrate that the corrected likelihood largely eliminates the bias in estimates of pairwise relationships and provides a more accurate characterization of the uncertainty in these estimates. We then reformulate the relationship inference problem to account for the fact that individuals share many common ancestors, not just one. We demonstrate that the most distant relationship that can be inferred may be forty degrees or more, rather than ten, extending the time-to-common ancestor from approximately 200 years in the past to approximately 600 years in the past or more. This dramatic increase in the range of relationship estimators makes it possible to infer relationships whose common ancestors lived before historical events such as European settlement of the Americas and the Transatlantic Slave Trade, and possibly much earlier.
Keyphrases
  • lymph node
  • healthcare
  • high throughput
  • high resolution
  • dna methylation
  • gene expression
  • single cell
  • health information
  • circulating tumor cells