A Data-Driven Pipeline to Discover Treatment Variations and the Associated Contributing Factors Balanced with Optimal Granularity.

Hao FanKian-Huat LimPo-Yin Yen

Published in: AMIA ... Annual Symposium proceedings. AMIA Symposium (2023)

Evidence-based medicine utilizes research evidence from clinical trials to support treatment decisions. To leverage the advantage of electronic health records and big data analysis methods, we developed a data-driven analytic pipeline that uses 1) agglomerative hierarchical clustering to define different granularity of treatment variation, 2) feature selection and multinomial multivariate logistic regression analysis to identify variables (factors) associated with treatment variation, and 3) prognosis analysis to compare patient outcome across top treatment groups. We tested our approach on the diffuse large B-cell lymphoma patient population from the MIMIC-IV dataset and found that our approach helps determine the optimal granularity of treatment variation and identify factors associated with treatment variation but not realized in randomized controlled trials due to unbalanced patient cohorts. We also found some patient cohorts' characteristics that could serve to inspire hypothesis generation, such as the influence of ethnicity on the treatment plans and subsequent prognoses.

Keyphrases