Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics.
Zihuai HeBenjamin ChuJames YangJiaqi GuZhaomeng ChenLinxi LiuTim MorrisonMichael E BelloyXinran QiNima HejaziMaya MathurYann Le GuenHua TangTrevor J HastieIuliana Ionita-LazaChiara SabattiEmmanuel CandèsPublished in: bioRxiv : the preprint server for biology (2024)
Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Here, we present a novel framework for genome-wide detection of sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. Crucially, our framework requires only summary statistics obtained from standard genome-wide marginal association testing. The described approach, implemented in open-source software, is also computationally efficient, requiring less than 15 minutes on a single CPU to perform genome-wide analysis. Through extensive genome-wide simulation studies, we show that the method can substantially outperform usual two-stage marginal association testing and fine-mapping procedures in precision and recall. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD), we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of 67 large-scale GWAS summary statistics since 2013 for a variety of phenotypes. Results reveal the method's capacity to robustly discover additional loci for polygenic traits and pinpoint potential causal variants underpinning each locus beyond conventional GWAS pipeline, contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses.