Gene selection by incorporating genetic networks into case-control association studies.
Xuewei CaoXiaoyu LiangShuanglin ZhangQiuying ShaPublished in: European journal of human genetics : EJHG (2022)
Large-scale genome-wide association studies (GWAS) have been successfully applied to a wide range of genetic variants underlying complex diseases. The network-based regression approach has been developed to incorporate a biological genetic network and to overcome the challenges caused by the computational efficiency for analyzing high-dimensional genomic data. In this paper, we propose a gene selection approach by incorporating genetic networks into case-control association studies for DNA sequence data or DNA methylation data. Instead of using traditional dimension reduction techniques such as principal component analyses and supervised principal component analyses, we use a linear combination of genotypes at SNPs or methylation values at CpG sites in a gene to capture gene-level signals. We employ three linear combination approaches: optimally weighted sum (OWS), beta-based weighted sum (BWS), and LD-adjusted polygenic risk score (LD-PRS). OWS and LD-PRS are supervised approaches that depend on the effect of each SNP or CpG site on the case-control status, while BWS can be extracted without using the case-control status. After using one of the linear combinations of genotypes or methylation values in each gene to capture gene-level signals, we regularize them to perform gene selection based on the biological network. Simulation studies show that the proposed approaches have higher true positive rates than using traditional dimension reduction techniques. We also apply our approaches to DNA methylation data and UK Biobank DNA sequence data for analyzing rheumatoid arthritis. The results show that the proposed methods can select potentially rheumatoid arthritis related genes that are missed by existing methods.
Keyphrases
- case control
- genome wide
- dna methylation
- copy number
- rheumatoid arthritis
- gene expression
- electronic health record
- big data
- machine learning
- computed tomography
- genome wide identification
- data analysis
- magnetic resonance
- magnetic resonance imaging
- systemic sclerosis
- single molecule
- interstitial lung disease
- systemic lupus erythematosus
- network analysis
- genome wide analysis
- cross sectional
- cell free