Login / Signup

Efficient identification of trait-associated loss-of-function variants in the UK Biobank cohort by exome-sequencing based genotype imputation.

Wen-Yuan YuShan-Shan YanShu-Han ZhangJing-Jing Ninull Bin-LiYu-Fang PeiYu-Fang Pei
Published in: Genetic epidemiology (2022)
The large-scale open access whole-exome sequencing (WES) data of the UK Biobank ~200,000 participants is accelerating a new wave of genetic association studies aiming to identify rare and functional loss-of-function (LoF) variants associated with complex traits and diseases. We proposed to merge the WES genotypes and the genome-wide genotyping (GWAS) genotypes of 167,000 UKB homogeneous European participants into a combined reference panel, and then to impute 241,911 UKB homogeneous European participants who had the GWAS genotypes only. We then used the imputed data to replicate association identified in the discovery WES sample. The average imputation accuracy measure r 2 is modest to high for LoF variants at all minor allele frequency intervals: 0.942 at MAF interval (0.01, 0.5), 0.807 at (1.0 × 10 -3 , 0.01), 0.805 at (1.0 × 10 -4 , 1.0 × 10 -3 ), 0.664 at (1.0 × 10 -5 , 1.0 × 10 -4 ) and 0.410 at (0, 1.0 × 10 -5 ). As applications, we studied associations of LoF variants with estimated heel BMD and four lipid traits. In addition to replicating dozens of previously reported genes, we also identified three novel associations, two genes PLIN1 and ANGPTL3 for high-density-lipoprotein cholesterol and one gene PDE3B for triglycerides. Our results highlighted the strength of WES based genotype imputation as well as provided useful imputed data within the UKB cohort.
Keyphrases
  • genome wide
  • copy number
  • dna methylation
  • electronic health record
  • big data
  • small molecule
  • cross sectional
  • high throughput
  • single cell
  • fatty acid
  • machine learning
  • genome wide association study