Improved polygenic prediction by Bayesian multiple regression on summary statistics.
Luke R Lloyd-JonesJian ZengJulia SidorenkoLoïc YengoGerhard MoserKathryn E KemperHuanwei WangZhili ZhengReedik MagiTõnu EskoAndres MetspaluNaomi R WrayMichael E GoddardJian YangPeter M VisscherPublished in: Nature communications (2019)
Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.