Tree-Based Models for Predicting Mortality in Gram-Negative Bacteremia: Avoid Putting the CART before the Horse.
Nathaniel James RhodesJ Nicholas O'DonnellBryan D LizzaMilena M McLaughlinJohn S EsterlyMarc H ScheetzPublished in: Antimicrobial agents and chemotherapy (2015)
Increasingly, infectious disease studies employ tree-based approaches, e.g., classification and regression tree modeling, to identify clinical thresholds. We present tree-based-model-derived thresholds along with their measures of uncertainty. We explored individual and pooled clinical cohorts of bacteremic patients to identify modified acute physiology and chronic health evaluation (II) (m-APACHE-II) score mortality thresholds using a tree-based approach. Predictive performance measures for each candidate threshold were calculated. Candidate thresholds were examined according to binary logistic regression probabilities of the primary outcome, correct classification predictive matrices, and receiver operating characteristic curves. Three individual cohorts comprising a total of 235 patients were studied. Within the pooled cohort, the mean (± standard deviation) m-APACHE-II score was 13.6 ± 5.3, with an in-hospital mortality of 16.6%. The probability of death was greater at higher m-APACHE II scores in only one of three cohorts (odds ratio for cohort 1 [OR1] = 1.15, 95% confidence interval [CI] = 0.99 to 1.34; OR2 = 1.04, 95% CI = 0.94 to 1.16; OR3 = 1.18, 95% CI = 1.02 to 1.38) and was greater at higher scores within the pooled cohort (OR4 = 1.11, 95% CI = 1.04 to 1.19). In contrast, tree-based models overcame power constraints and identified m-APACHE-II thresholds for mortality in two of three cohorts (P = 0.02, 0.1, and 0.008) and the pooled cohort (P = 0.001). Predictive performance at each threshold was highly variable among cohorts. The selection of any one predictive threshold value resulted in fixed sensitivity and specificity. Tree-based models increased power and identified threshold values from continuous predictor variables; however, sample size and data distributions influenced the identified thresholds. The provision of predictive matrices or graphical displays of predicted probabilities within infectious disease studies can improve the interpretation of tree-based model-derived thresholds.
Keyphrases
- infectious diseases
- gram negative
- end stage renal disease
- newly diagnosed
- ejection fraction
- chronic kidney disease
- cardiovascular events
- deep learning
- machine learning
- healthcare
- multidrug resistant
- prognostic factors
- public health
- coronary artery disease
- magnetic resonance imaging
- cardiovascular disease
- liver failure
- clinical trial
- palliative care
- patient reported outcomes
- climate change
- patient reported
- study protocol
- aortic dissection