A classification model of homelessness using integrated administrative data: Implications for targeting interventions to improve the housing status, health and well-being of a highly vulnerable population.
Thomas H ByrneTravis BaggettThomas LandDana BernsonMaria-Elena HoodCheryl Kennedy-PerezRodrigo MonterreyDavid SmelsonMarc DonesMonica BharelPublished in: PloS one (2020)
Homelessness is poorly captured in most administrative data sets making it difficult to understand how, when, and where this population can be better served. This study sought to develop and validate a classification model of homelessness. Our sample included 5,050,639 individuals aged 11 years and older who were included in a linked dataset of administrative records from multiple state-maintained databases in Massachusetts for the period from 2011-2015. We used logistic regression to develop a classification model with 94 predictors and subsequently tested its performance. The model had high specificity (95.4%), moderate sensitivity (77.8%) for predicting known cases of homelessness, and excellent classification properties (area under the receiver operating curve 0.94; balanced accuracy 86.4%). To demonstrate the potential opportunity that exists for using such a modeling approach to target interventions to mitigate the risk of an adverse health outcome, we also estimated the association between model predicted homeless status and fatal opioid overdoses, finding that model predicted homeless status was associated with a nearly 23-fold increase in the risk of fatal opioid overdose. This study provides a novel approach for identifying homelessness using integrated administrative data. The strong performance of our model underscores the potential value of linking data from multiple service systems to improve the identification of housing instability and to assist government in developing programs that seek to improve health and other outcomes for homeless individuals.
Keyphrases
- mental illness
- mental health
- public health
- healthcare
- machine learning
- deep learning
- electronic health record
- physical activity
- chronic pain
- health information
- type diabetes
- metabolic syndrome
- skeletal muscle
- adipose tissue
- risk assessment
- high intensity
- drug delivery
- data analysis
- weight loss
- climate change
- bioinformatics analysis