Login / Signup

An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records.

Lauren J BeesleyLars G FritscheBhramar Mukherjee
Published in: Statistics in medicine (2020)
Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.
Keyphrases
  • electronic health record
  • clinical decision support
  • adverse drug
  • healthcare
  • gene expression
  • copy number
  • dna methylation
  • drinking water
  • big data
  • quality improvement
  • health insurance