Login / Signup

Calibrated prediction intervals for polygenic scores across diverse contexts.

Kangcheng HouZiqi XuYi DingArbel HarpakBogdan Pasaniuc
Published in: medRxiv : the preprint server for health sciences (2023)
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Keyphrases
  • blood pressure
  • body mass index
  • genome wide
  • healthcare
  • heart failure
  • mental health
  • metabolic syndrome
  • electronic health record
  • big data
  • atrial fibrillation
  • dna methylation
  • insulin resistance
  • heart rate
  • social media