Login / Signup

A comparison of different measures of the proportion of explained variance in multiply imputed data sets.

Joost R van GinkelJulian D Karch
Published in: The British journal of mathematical and statistical psychology (2024)
The proportion of explained variance is an important statistic in multiple regression for determining how well the outcome variable is predicted by the predictors. Earlier research on 20 different estimators for the proportion of explained variance, including the exact Olkin-Pratt estimator and the Ezekiel estimator, showed that the exact Olkin-Pratt estimator produced unbiased estimates, and was recommended as a default estimator. In the current study, the same 20 estimators were studied in incomplete data, with missing data being treated using multiple imputation. In earlier research on the proportion of explained variance in multiply imputed data sets, an estimator called R ̂ SP 2 $$ {\hat{R}}_{\mathrm{SP}}^2 $$ was shown to be the preferred pooled estimator for regular R 2 $$ {R}^2 $$ . For each of the 20 estimators in the current study, two pooled estimators were proposed: one where the estimator was the average across imputed data sets, and one where R ̂ SP 2 $$ {\hat{R}}_{\mathrm{SP}}^2 $$ was used as input for the calculation of the specific estimator. Simulations showed that estimates based on R ̂ SP 2 $$ {\hat{R}}_{\mathrm{SP}}^2 $$ performed best regarding bias and accuracy, and that the Ezekiel estimator was generally the least biased. However, none of the estimators were unbiased at all times, including the exact Olkin-Pratt estimator based on R ̂ SP 2 $$ {\hat{R}}_{\mathrm{SP}}^2 $$ .
Keyphrases
  • electronic health record
  • big data
  • functional connectivity
  • placebo controlled