Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles.
Lan-Yun ChangMeng-Zhan LeeYujia WuWen-Kai LeeChia-Liang MaJun-Mao ChangCiao-Wen ChenTzu-Chun HuangChia-Hwa LeeJih-Chin LeeYu-Yao TsengChun-Yu LinPublished in: Nucleic acids research (2023)
Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach for annotating individual DEGs, DEG lists, and genome-wide expression profiles based on existing biological knowledge.
Keyphrases
- genome wide
- genome wide identification
- copy number
- rna seq
- mycobacterium tuberculosis
- dna methylation
- poor prognosis
- sars cov
- coronavirus disease
- electronic health record
- single cell
- big data
- endothelial cells
- genome wide analysis
- long non coding rna
- binding protein
- mass spectrometry
- oxidative stress
- machine learning
- case report
- deep learning
- artificial intelligence
- data analysis
- bioinformatics analysis
- drug induced