Login / Signup

Multiple imputation in data that grow over time: a comparison of three strategies.

Xynthia KavelaarsJoost R van GinkelS van Buuren
Published in: Multivariate behavioral research (2021)
Multiple imputation is a recommended technique to deal with missing data. We study the problem where the investigator has already created imputations before the arrival of the next wave of data. The newly arriving data contain missing values that need to be imputed. The standard method (RE-IMPUTE) is to combine the new and old data before imputation, and re-impute all missing values in the combined data. We study the properties of two methods that impute the missing data in the new part only, thus preserving the historic imputations. Method NEST multiply imputes the new data conditional on each filled-in old data m2>1 times. Method APPEND is the special case of NEST with m2=1, thus appending each filled-in data by single imputation. We found that NEST and APPEND have the same validity as RE-IMPUTE for monotone missing data-patterns. NEST and APPEND also work well when relations within waves are stronger than between waves and for moderate percentages of missing data. We do not recommend the use of NEST or APPEND when relations within time points are weak and when associations between time points are strong.
Keyphrases
  • electronic health record
  • big data
  • artificial intelligence
  • high intensity