Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records.
Jitong LouYuanjia WangLang LiDonglin ZengPublished in: Statistics in medicine (2021)
Electronic health records (EHRs) from type 2 diabetes (T2D) patients consist of longitudinally and sparsely measured health markers at clinical encounters. Our goal is to use such data to learn latent patterns that can inform patient's health status related to T2D while accounting for challenges in retrospectively collected EHRs. To handle challenges such as correlated longitudinal measurements, irregular and informative encounter times, and mixed marker types, we propose multivariate generalized linear models to learn latent patient subgroups. In our model, covariate effects were time-dependent and latent Gaussian processes were introduced to model between-marker correlations over time. Using inferred latent processes, we integrated the irregularly measured health markers of mixed types into composite scores and applied hierarchical clustering to learn latent subgroup structures among T2D patients. Application to an EHR dataset of T2D patients showed different trends of age, sex, and race effects on hypertension/high blood pressure, total cholesterol, glycated hemoglobin, high-density lipoprotein, and medications. The associations among these markers varied over time during the study window. Clustering results revealed four subgroups, each with distinct health status. The same patterns were further confirmed using new EHR records of the same cohort. We developed a novel latent model to integrate longitudinal health markers in EHRs and characterize patient latent heterogeneities. Analysis indicated that there were distinct subgroups of T2D patients, suggesting that effective healthcare managements for these patients should be performed separately for each subgroup.
Keyphrases
- healthcare
- type diabetes
- end stage renal disease
- electronic health record
- blood pressure
- newly diagnosed
- ejection fraction
- prognostic factors
- single cell
- clinical trial
- randomized controlled trial
- cardiovascular disease
- case report
- mass spectrometry
- cross sectional
- skeletal muscle
- open label
- rna seq
- high resolution
- social media
- glycemic control
- drug induced