Technical considerations for evaluating clinical prediction indices: a case study for predicting code blue events with MEWS.
Kais GadhoumiAlex BeltranChristopher G ScullyRan XiaoDavid O NahmiasXiao HuPublished in: Physiological measurement (2021)
Objective.There have been many efforts to develop tools predictive of health deterioration in hospitalized patients, but comprehensive evaluation of their predictive ability is often lacking to guide implementation in clinical practice. In this work, we propose new techniques and metrics for evaluating the performance of predictive alert algorithms and illustrate the advantage of capturing the timeliness and the clinical burden of alerts through the example of the modified early warning score (MEWS) applied to the prediction of in-hospital code blue events.Approach. Different implementations of MEWS were calculated from available physiological parameter measurements collected from the electronic health records of ICU adult patients. The performance of MEWS was evaluated using conventional and a set of non-conventional metrics and approaches that take into account the timeliness and practicality of alarms as well as the false alarm burden.Main results. MEWS calculated using the worst-case measurement (i.e. values scoring 3 points in the MEWS definition) over 2 h intervals significantly reduced the false alarm rate by over 50% (from 0.19/h to 0.08/h) while maintaining similar sensitivity levels as MEWS calculated from raw measurements (∼80%). By considering a prediction horizon of 12 h preceding a code blue event, a significant improvement in the specificity (∼60%), the precision (∼155%), and the work-up to detection ratio (∼50%) could be achieved, at the cost of a relatively marginal decrease in sensitivity (∼10%).Significance. Performance aspects pertaining to the timeliness and burden of alarms can aid in understanding the potential utility of a predictive alarm algorithm in clinical settings.