Login / Signup

A doubly robust method to handle missing multilevel outcome data with application to the China Health and Nutrition Survey.

Nicole M ButeraDonglin ZengAnnie Green HowardPenny Gordon-LarsenJianwen Cai
Published in: Statistics in medicine (2021)
Missing data are common in longitudinal cohort studies and can lead to bias, particularly in studies with informative missingness. Many common methods for handling informatively missing data in survey samples require correctly specifying a model for missingness. Although doubly robust methods exist to provide unbiased regression coefficients in the presence of missing outcome data, these methods do not account for correlation due to clustering inherent in longitudinal or cluster-sampled studies. In this work, we developed a doubly robust method to estimate the regression of an outcome on a predictor in the presence of missing multilevel data on the outcome, which results in consistent estimation of regression coefficients assuming correct specification of either (1) the probability of missingness or (2) the outcome model. This method involves specification of separate hierarchical models for missingness and for the outcome, conditional on observed auxiliary variables and cluster-specific random effects, to account for correlation among observations. We showed this proposed estimator is doubly robust and derived its asymptotic distribution, conducted simulation studies to compare the method to an existing doubly robust method developed for independent data, and applied the method to data from the China Health and Nutrition Survey, an ongoing multilevel longitudinal cohort study.
Keyphrases
  • electronic health record
  • big data
  • healthcare
  • cross sectional
  • mental health
  • physical activity
  • machine learning
  • data analysis
  • single cell
  • climate change
  • case control
  • cell fate