Enhancing Disease Risk Gene Discovery by Integrating Transcription Factor-Linked Trans-located Variants into Transcriptome-Wide Association Analyses.
Jingni HeWanqing WenJie PingQing LiZhishan ChenD D B D PereraXiang ShuJirong LongQiuyin CaiXiao-Ou ShuWei ZhengQuan LongXingyi GuoPublished in: medRxiv : the preprint server for health sciences (2023)
Transcriptome-wide association studies (TWAS) have been successful in identifying putative disease susceptibility genes by integrating gene expression predictions with genome-wide association studies (GWAS) data. However, current TWAS models only consider cis-located variants to predict gene expression. Here, we introduce transTF-TWAS, which includes transcription factor (TF)-linked trans-located variants for model building. Using data from the Genotype-Tissue Expression project, we predict alternative splicing and gene expression and applied these models to large GWAS datasets for breast, prostate, and lung cancers. Our analysis revealed 887 putative cancer susceptibility genes, including 465 in regions not yet reported by previous GWAS and 137 in known GWAS loci but not yet reported previously, at Bonferroni-corrected P < 0.05. We demonstrate that transTF-TWAS surpasses other approaches in both building gene prediction models and identifying disease-associated genes. These results have shed new light on several genetically driven key regulators and their associated regulatory networks underlying disease susceptibility.
Keyphrases
- gene expression
- genome wide
- transcription factor
- dna methylation
- copy number
- genome wide identification
- prostate cancer
- genome wide association
- rna seq
- small molecule
- genome wide analysis
- big data
- dna binding
- bioinformatics analysis
- young adults
- deep learning
- high throughput
- machine learning
- papillary thyroid
- quality improvement
- electronic health record
- case control
- benign prostatic hyperplasia