Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.
Bryan E ShepherdPamela A ShawPublished in: Statistical communications in infectious diseases (2020)
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data. Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study. Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Keyphrases
- electronic health record
- human immunodeficiency virus
- hiv aids
- antiretroviral therapy
- clinical decision support
- adverse drug
- hepatitis c virus
- hiv infected
- mental health
- healthcare
- hiv positive
- big data
- case report
- emergency department
- insulin resistance
- single cell
- metabolic syndrome
- weight loss
- rna seq
- south africa
- deep learning