Predicting absolute risk for a person with missing risk factors.

Bang WangYu ChengMitchell H GailJason FineRuth M Pfeiffer

Published in: Statistical methods in medical research (2024)

We compared methods to project absolute risk, the probability of experiencing the outcome of interest in a given projection interval accommodating competing risks, for a person from the target population with missing predictors. Without missing data, a perfectly calibrated model gives unbiased absolute risk estimates in a new target population, even if the predictor distribution differs from the training data. However, if predictors are missing in target population members, a reference dataset with complete data is needed to impute them and to estimate absolute risk, conditional only on the observed predictors. If the predictor distributions of the reference data and the target population differ, this approach yields biased estimates. We compared the bias and mean squared error of absolute risk predictions for seven methods that assume predictors are missing at random (MAR). Some methods imputed individual missing predictors, others imputed linear predictor combinations (risk scores). Simulations were based on real breast cancer predictor distributions and outcome data. We also analyzed a real breast cancer dataset. The largest bias for all methods resulted from different predictor distributions of the reference and target populations. No method was unbiased in this situation. Surprisingly, violating the MAR assumption did not induce severe biases. Most multiple imputation methods performed similarly and were less biased (but more variable) than a method that used a single expected risk score. Our work shows the importance of selecting predictor reference datasets similar to the target population to reduce bias of absolute risk predictions with missing risk factors.

Keyphrases