Login / Signup

Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits.

Brian C ZhangArjun BiddandaÁrni Freyr GunnarssonFergus CooperPier Francesco Palamara
Published in: Nature genetics (2023)
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Keyphrases
  • genome wide
  • copy number
  • dna methylation
  • single cell
  • electronic health record
  • big data
  • high resolution
  • gene expression
  • cross sectional
  • oxidative stress
  • machine learning
  • data analysis