Variable selection in social-environmental data: sparse regression and tree ensemble machine learning approaches.
Elizabeth A HandorfYinuo YinMichael SlifkerShannon LynchPublished in: BMC medical research methodology (2020)
This analysis demonstrated the potential of empirical machine learning approaches to identify a small subset of census variables having a true association with the outcome, and that replicate across empiric methods. Sparse clustered regression models performed best, as they identified many true positive variables while controlling false positive discoveries.