On the development and validation of large language model-based classifiers for identifying social determinants of health.
Rodney A GabrielOnkar LitakeSierra SimpsonBrittany N BurtonRuth S WatermanAlvaro Andrés MacíasPublished in: Proceedings of the National Academy of Sciences of the United States of America (2024)
The assessment of social determinants of health (SDoH) within healthcare systems is crucial for comprehensive patient care and addressing health disparities. Current challenges arise from the limited inclusion of structured SDoH information within electronic health record (EHR) systems, often due to the lack of standardized diagnosis codes. This study delves into the transformative potential of large language models (LLM) to overcome these challenges. LLM-based classifiers-using Bidirectional Encoder Representations from Transformers (BERT) and A Robustly Optimized BERT Pretraining Approach (RoBERTa)-were developed for SDoH concepts, including homelessness, food insecurity, and domestic violence, using synthetic training datasets generated by generative pre-trained transformers combined with authentic clinical notes. Models were then validated on separate datasets: Medical Information Mart for Intensive Care-III and our institutional EHR data. When training the model with a combination of synthetic and authentic notes, validation on our institutional dataset yielded an area under the receiver operating characteristics curve of 0.78 for detecting homelessness, 0.72 for detecting food insecurity, and 0.83 for detecting domestic violence. This study underscores the potential of LLMs in extracting SDoH information from clinical text. Automated detection of SDoH may be instrumental for healthcare providers in identifying at-risk patients, guiding targeted interventions, and contributing to population health initiatives aimed at mitigating disparities.
Keyphrases
- healthcare
- electronic health record
- health information
- mental health
- public health
- human health
- ejection fraction
- clinical decision support
- autism spectrum disorder
- physical activity
- working memory
- deep learning
- machine learning
- newly diagnosed
- rna seq
- high throughput
- mental illness
- affordable care act
- climate change
- big data
- artificial intelligence
- risk assessment
- chronic kidney disease
- cancer therapy
- prognostic factors