Login / Signup

Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study.

Theodore J IwashynaCheng MaXiao Qing WangSarah SeelyeJi ZhuAkbar K Waljee
Published in: BMJ open (2020)
Variation in discrimination was seen as a function of data cleanliness, with logistic regression suffering the most loss of discrimination in the least clean data. Losses in discrimination were not present in random forest and neural networks even in naively extracted data. Data from a large nationwide health system revealed interactions between missing data imputation techniques, data cleanliness and classification methods for predicting 30-day mortality.
Keyphrases
  • electronic health record
  • big data
  • neural network
  • cardiovascular disease
  • cross sectional
  • data analysis