Analyzing evidence-based falls prevention data with significant missing information using variable selection after multiple imputation.
Yujia ChengYang LiMatthew Lee SmithChangwei LiYe ShenPublished in: Journal of applied statistics (2021)
Falls are the leading cause of fatal and non-fatal injuries among older adults. Evidence-based fall prevention programs are delivered nationwide, largely supported by funding from the Administration for Community Living (ACL), to mitigate fall-related risk. This study utilizes data from 39 ACL grantees in 22 states from 2014 to 2017. The large amount of missing values for falls efficacy in this national database may lead to potentially biased statistical results and make it challenging to implement reliable variable selection. Multiple imputation is used to deal with missing values. To obtain a consistent result of variable selection in multiply-imputed datasets, multiple imputation-stepwise regression (MI-stepwise) and multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) methods are used. To compare the performances of MI-stepwise and MI-LASSO, simulation studies were conducted. In particular, we extended prior work by considering several circumstances not covered in previous studies, including an extensive investigation of data with different signal-to-noise ratios and various missing data patterns across predictors, as well as a data structure that allowed the missingness mechanism to be missing not at random (MNAR). In addition, we evaluated the performance of MI-LASSO method with varying tuning parameters to address the overselection issue in cross-validation (CV)-based LASSO.