The Gene Expression Landscape of Disease Genes.
Judit García-GonzálezSaul Garcia-GonzalezLathan LiouPaul F O'ReillyPublished in: medRxiv : the preprint server for health sciences (2024)
Fine-mapping and gene-prioritisation techniques applied to the latest Genome-Wide Association Study (GWAS) results have prioritised hundreds of genes as causally associated with disease. Here we leverage these recently compiled lists of high-confidence causal genes to interrogate where in the body disease genes operate. Specifically, we combine GWAS summary statistics, gene prioritisation results and gene expression RNA-seq data from 46 tissues and 204 cell types in relation to 16 major diseases (including 8 cancers). In tissues and cell types with well-established relevance to the disease, the prioritised genes typically have higher absolute and relative (i.e. tissue/cell specific) expression compared to non-prioritised 'control' genes. Examples include brain tissues in psychiatric disorders ( P -value < 1×10 -7 ), microglia cells in Alzheimer's Disease ( P -value = 9.8×10 -3 ) and colon mucosa in colorectal cancer ( P -value < 1×10 -3 ). We also observe significantly higher expression for disease genes in multiple tissues and cell types with no established links to the corresponding disease. While some of these results may be explained by cell types that span multiple tissues, such as macrophages in brain, blood, lung and spleen in relation to Alzheimer's disease ( P -values < 1×10 -3 ), the cause for others is unclear and motivates further investigation that may provide novel insights into disease etiology. For example, mammary tissue in Type 2 Diabetes ( P -value < 1×10 -7 ); reproductive tissues such as breast, uterus, vagina, and prostate in Coronary Artery Disease ( P -value < 1×10 -4 ); and motor neurons in psychiatric disorders ( P -value < 3×10 -4 ). In the GTEx dataset, tissue type is the major predictor of gene expression but the contribution of each predictor (tissue, sample, subject, batch) varies widely among disease-associated genes. Finally, we highlight genes with the highest levels of gene expression in relevant tissues to guide functional follow-up studies. Our results could offer novel insights into the tissues and cells involved in disease initiation, inform drug target and delivery strategies, highlighting potential off-target effects, and exemplify the relative performance of different statistical tests for linking disease genes with tissue and cell type gene expression.
Keyphrases
- gene expression
- genome wide
- type diabetes
- coronary artery disease
- single cell
- rna seq
- genome wide identification
- emergency department
- cell therapy
- cardiovascular disease
- adipose tissue
- bone marrow
- spinal cord injury
- inflammatory response
- risk assessment
- induced apoptosis
- blood brain barrier
- left ventricular
- percutaneous coronary intervention
- young adults
- copy number
- high resolution
- big data
- skeletal muscle
- pi k akt
- ejection fraction
- coronary artery bypass grafting
- binding protein