Overcome the Limitation of Phenome-Wide Association Studies (PheWAS): Extension of PheWAS to Efficient and Robust Large-Scale ICD Codes Analysis.
Ya-Chen LinSiwei ZhangTess VesselsLisa BastaracheCosmin Adrian BejanRyan S HsieElizabeth J PhilipsDoug M RuderferJill M PulleyTodd L EdwardsQuinn S WellsJeremy L WarnerJoshua C DennyDan M RodenHakmook KangYaomin XuPublished in: medRxiv : the preprint server for health sciences (2024)
The Phenome-wide association studies (PheWAS) have become widely used for efficient, high-throughput evaluation of relationship between a genetic factor and a large number of disease phenotypes, typically extracted from a DNA biobank linked with electronic medical records (EMR). Phecodes, billing code-derived disease case-control status, are usually used as outcome variables in PheWAS and logistic regression has been the standard choice of analysis method. Since the clinical diagnoses in EMR are often inaccurate with errors which can lead to biases in the odds ratio estimates, much effort has been put to accurately define the cases and controls to ensure an accurate analysis. Specifically in order to correctly classify controls in the population, an exclusion criteria list for each Phecode was manually compiled to obtain unbiased odds ratios. However, the accuracy of the list cannot be guaranteed without extensive data curation process. The costly curation process limits the efficiency of large-scale analyses that take full advantage of all structured phenotypic information available in EMR. Here, we proposed to estimate relative risks (RR) instead. We first demonstrated the desired nature of R R that overcomes the inaccuracy in the controls via theoretical formula. With simulation and real data application, we further confirmed that R R is unbiased without compiling exclusion criteria lists. With R R as estimates, we are able to efficiently extend PheWAS to a larger-scale, phenome construction agnostic analysis of phenotypes, using ICD 9/10 codes, which preserve much more disease-related clinical information than Phecodes.