A scalable variational approach to characterize pleiotropic components across thousands of human diseases and complex traits using GWAS summary statistics.
Zixuan ZhangJunghyun JungArtem KimNoah SubocSteven GazalNicholas MancusoPublished in: medRxiv : the preprint server for health sciences (2023)
Genome-wide association studies (GWAS) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes, while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N=420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (P=2.58E-10), and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest novel shared etiologies between rheumatoid arthritis and periodontal condition, in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWAS.
Keyphrases
- prostate cancer
- rheumatoid arthritis
- genome wide association study
- genome wide association
- body mass index
- genome wide
- electronic health record
- endothelial cells
- big data
- body composition
- mass spectrometry
- single cell
- gene expression
- multiple sclerosis
- machine learning
- induced pluripotent stem cells
- physical activity
- brain injury
- weight gain
- data analysis
- weight loss