mixWAS: An efficient distributed algorithm for mixed-outcomes genome-wide association studies.
Ruowang LiLuke BenzRui DuanJoshua C DennyHakon HakonarsonJonathan D MosleyJordan W SmollerWei-Qi WeiMarylyn D RitchieJason H MooreYong ChenPublished in: medRxiv : the preprint server for health sciences (2024)
Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.
Keyphrases
- electronic health record
- genome wide
- genome wide association
- clinical decision support
- dna methylation
- copy number
- adverse drug
- loop mediated isothermal amplification
- machine learning
- deep learning
- real time pcr
- body mass index
- climate change
- type diabetes
- artificial intelligence
- cross sectional
- case control
- adipose tissue
- fatty acid
- insulin resistance
- skeletal muscle
- extracorporeal membrane oxygenation
- bioinformatics analysis