Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.

Keyvan Rahmani Rahul ThapaPeiling TsouSatish Casie Chetty Gina Barnes Carson Lam Chak Foon Tso

Published in: medRxiv : the preprint server for health sciences (2022)

Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.

Keyphrases