Factors associated with resistance to SARS-CoV-2 infection discovered using large-scale medical record data and machine learning.
Kai-Wen K YangChloé F ParisKevin T GormanIlia RattsevRebecca H YooYijia ChenJacob M DesmanTony Y WeiJoseph L GreensteinCasey Overby TaylorStuart C RayPublished in: PloS one (2023)
There have been over 621 million cases of COVID-19 worldwide with over 6.5 million deaths. Despite the high secondary attack rate of COVID-19 in shared households, some exposed individuals do not contract the virus. In addition, little is known about whether the occurrence of COVID-19 resistance differs among people by health characteristics as stored in the electronic health records (EHR). In this retrospective analysis, we develop a statistical model to predict COVID-19 resistance in 8,536 individuals with prior COVID-19 exposure using demographics, diagnostic codes, outpatient medication orders, and count of Elixhauser comorbidities in EHR data from the COVID-19 Precision Medicine Platform Registry. Cluster analyses identified 5 patterns of diagnostic codes that distinguished resistant from non-resistant patients in our study population. In addition, our models showed modest performance in predicting COVID-19 resistance (best performing model AUROC = 0.61). Monte Carlo simulations conducted indicated that the AUROC results are statistically significant (p < 0.001) for the testing set. We hope to validate the features found to be associated with resistance/non-resistance through more advanced association studies.
Keyphrases
- coronavirus disease
- sars cov
- electronic health record
- machine learning
- respiratory syndrome coronavirus
- healthcare
- end stage renal disease
- monte carlo
- emergency department
- big data
- chronic kidney disease
- ejection fraction
- adverse drug
- newly diagnosed
- peritoneal dialysis
- climate change
- molecular dynamics
- artificial intelligence
- high throughput
- single cell
- social media