A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations.
Rohan KheraBobak J MortazaviVeer SanghaFrederick WarnerH Patrick YoungJoseph R RossNilay D ShahElitza S TheelWilliam G JenkinsonCamille KnepperKaren H WangDavid PeaperRichard A MartinelloCynthia A BrandtZhenqiu LinAlbert I KoHarlan M KrumholzBenjamin D PollockWade L SchulzPublished in: NPJ digital medicine (2022)
Diagnosis codes are used to study SARS-CoV2 infections and COVID-19 hospitalizations in administrative and electronic health record (EHR) data. Using EHR data (April 2020-March 2021) at the Yale-New Haven Health System and the three hospital systems of the Mayo Clinic, computable phenotype definitions based on ICD-10 diagnosis of COVID-19 (U07.1) were evaluated against positive SARS-CoV-2 PCR or antigen tests. We included 69,423 patients at Yale and 75,748 at Mayo Clinic with either a diagnosis code or a positive SARS-CoV-2 test. The precision and recall of a COVID-19 diagnosis for a positive test were 68.8% and 83.3%, respectively, at Yale, with higher precision (95%) and lower recall (63.5%) at Mayo Clinic, varying between 59.2% in Rochester to 97.3% in Arizona. For hospitalizations with a principal COVID-19 diagnosis, 94.8% at Yale and 80.5% at Mayo Clinic had an associated positive laboratory test, with secondary diagnosis of COVID-19 identifying additional patients. These patients had a twofold higher inhospital mortality than based on principal diagnosis. Standardization of coding practices is needed before the use of diagnosis codes in clinical research and epidemiological surveillance of COVID-19.
Keyphrases
- sars cov
- coronavirus disease
- electronic health record
- respiratory syndrome coronavirus
- primary care
- end stage renal disease
- healthcare
- ejection fraction
- newly diagnosed
- chronic kidney disease
- prognostic factors
- public health
- cardiovascular disease
- emergency department
- peritoneal dialysis
- deep learning
- coronary artery disease