Login / Signup

Estimating the area under the ROC curve when transporting a prediction model to a target population.

Bing LiConstantine GatsonisIssa J DahabrehJon Arni Steingrimsson
Published in: Biometrics (2022)
We propose methods for estimating the area under the receiver operating characteristic (ROC) curve (AUC) of a prediction model in a target population that differs from the source population that provided the data used for original model development. If covariates that are associated with model performance, as measured by the AUC, have a different distribution in the source and target populations, then AUC estimators that only use data from the source population will not reflect model performance in the target population. Here, we provide identification results for the AUC in the target population when outcome and covariate data are available from the sample of the source population, but only covariate data are available from the sample of the target population. In this setting, we propose three estimators for the AUC in the target population and show that they are consistent and asymptotically normal. We evaluate the finite-sample performance of the estimators using simulations and use them to estimate the AUC in a nationally representative target population from the National Health and Nutrition Examination Survey for a lung cancer risk prediction model developed using source population data from the National Lung Screening Trial.
Keyphrases
  • electronic health record
  • big data
  • molecular dynamics
  • study protocol
  • data analysis
  • monte carlo
  • bioinformatics analysis