Login / Signup

A multi-locus predictiveness curve and its summary assessment for genetic risk prediction.

Changshuai WeiMing LiYalu WenChengyin YeQing Lu
Published in: Statistical methods in medical research (2019)
Genetic association studies using high-throughput genotyping and sequencing technologies have identified a large number of genetic variants associated with complex human diseases. These findings have provided an unprecedented opportunity to identify individuals in the population at high risk for disease who carry causal genetic mutations and hold great promise for early intervention and individualized medicine. While interest is high in building risk prediction models based on recent genetic findings, it is crucial to have appropriate statistical measurements to assess the performance of a genetic risk prediction model. Predictiveness curves were recently proposed as a graphic tool for evaluating a risk prediction model on the basis of a single continuous biomarker. The curve evaluates a risk prediction model for classification performance as well as its usefulness when applied to a population. In this article, we extend the predictiveness curve to measure the collective contribution of multiple genetic variants. We further propose a nonparametric, U-statistics-based measurement, referred to as the U-Index, to quantify the performance of a multi-locus predictiveness curve. In particular, a global U-Index and a partial U-Index can be used in the general population and a subpopulation of particular clinical interest, respectively. Through simulation studies, we demonstrate that the proposed U-Index has advantages over several existing summary statistics under various disease models. We also show that the partial U-Index can have its own uniqueness when rare variants have a substantial contribution to disease risk. Finally, we use the proposed predictiveness curve and its corresponding U-Index to evaluate the performance of a genetic risk prediction model for nicotine dependence.
Keyphrases
  • genome wide
  • copy number
  • high throughput
  • machine learning
  • gene expression
  • single cell
  • deep learning
  • artificial intelligence