Integrating transcription factor occupancy with transcriptome-wide association analysis identifies susceptibility genes in human cancers.
Jingni HeWanqing WenAlicia BeeghlyZhishan ChenChen CaoXiao-Ou ShuQuan LongQuan LongXingyi GuoPublished in: Nature communications (2022)
Transcriptome-wide association studies (TWAS) have successfully discovered many putative disease susceptibility genes. However, TWAS may suffer from inaccuracy of gene expression predictions due to inclusion of non-regulatory variants. By integrating prior knowledge of susceptible transcription factor occupied elements, we develop sTF-TWAS and demonstrate that it outperforms existing TWAS approaches in both simulation and real data analyses. Under the sTF-TWAS framework, we build genetic models to predict alternative splicing and gene expression in normal breast, prostate and lung tissues from the Genotype-Tissue Expression project and apply these models to data from large genome-wide association studies (GWAS) conducted among European-ancestry populations. At Bonferroni-corrected P < 0.05, we identify 354 putative susceptibility genes for these cancers, including 189 previously unreported in GWAS loci and 45 in loci unreported by GWAS. These findings provide additional insight into the genetic susceptibility of human cancers. Additionally, we show the generalizability of the sTF-TWAS on non-cancer diseases.
Keyphrases
- genome wide
- dna methylation
- gene expression
- transcription factor
- copy number
- endothelial cells
- genome wide association
- genome wide association study
- genome wide identification
- induced pluripotent stem cells
- prostate cancer
- healthcare
- electronic health record
- dna binding
- poor prognosis
- pluripotent stem cells
- childhood cancer
- single cell
- case control
- quality improvement
- papillary thyroid
- rna seq
- binding protein
- bioinformatics analysis
- deep learning