Comparison of EHR data-completeness in patients with different types of medical insurance coverage in the United States.
Priyanka AnandYichi ZhangDavid MerolaYinzhu JinShirley V WangJoyce LiiJun LiuKueiyu Joshua LinPublished in: Clinical pharmacology and therapeutics (2023)
Prior studies have demonstrated that misclassification of study variables due to electronic health record (EHR)-discontinuity can be mitigated by restricting EHR-based analyses to subjects with high predicted EHR-continuity based on a simple algorithm. In this study, we compared EHR continuity in populations covered by Medicare, Medicaid, or commercial insurance. Using claims linked EHRs from a multi-center network in Massachusetts, including Medicare (MA EHR-Medicare cohort) and Medicaid (MA EHR-Medicaid cohort) claims data; and TriNetX (TriNetX cohort) claims linked EHR data from 11 US based healthcare organizations, we assessed (1) EHR-continuity quantified by proportion of encounters captured by EHR (capture proportion, CP); (2) area under receiver operating curve (AUROC) of previously validated model to identify patients with high EHR-continuity (CP>0.6); (3) misclassification of 40 patient characteristics, quantified by average standardized absolute mean difference (ASAMD). Study participants were ≥65 years (Medicare) or ≥18 years (Medicaid, TriNetX) with ≥365 days of continuous insurance enrollment overlapping with an EHR encounter. We found that the mean CP was 0.30, 0.18 and 0.19 and AUROC of the prediction model to identify patients with high EHR-continuity was 0.92, 0.89 and 0.77 in the MA EHR-Medicare, MA EHR-Medicaid and TriNetX cohorts, respectively. Restricting to patients with predicted EHR-continuity percentile of top 20%, 50%, and 50% in MA EHR-Medicare, MA EHR-Medicaid, and TriNetX cohorts resulted in acceptable levels of misclassification (ASAMD <0.1). Using a prediction model to identify cohorts with high EHR-continuity can improve validity, but cut-offs to achieve this goal vary by populations.