Machine Learning Predicts the Oxidative Stress Subtypes Provide an Innovative Insight into Colorectal Cancer.
Haitao ZhongLe YangQingshang ZengWeidong ChenHaibo ZhaoFengfeng LiLei QinQing-Qing YuPublished in: Oxidative medicine and cellular longevity (2023)
So far, it has been reached the academic consensus that the molecular subtypes are via genomic heterogeneity and immune infiltration patterns. Considering that oxidative stress (OS) is involved in tumorigenesis and prognosis prediction, we propose an innovative classification of colorectal cancer- (CRC-) OS subtypes. We obtain three datasets from The Cancer Genome Atlas Program (TCGA) and Gene Expression Omnibus (GEO) online databases. 1399 OS-related genes were selected from the GeneCards database. We remove the batch effect before conducting differentially expressed genes (DEGs) analyses between normal and tumor samples. Nonnegative matrix factorization (NMF) was used to perform an unsupervised cluster. Lasso regression and Cox regression were used to construct the signature model. DEGs, robust rank aggregation, and protein-protein interaction networks were used to select hub genes, and then use hub genes to predict OS subtypes by random forest algorithms. NMF identifies two OS-related subtypes of CRC patients. Eight OS-related gene signatures were built to predict the outcome of patients, based on the DEGs between two subtypes. A total of 61 DEGs overlap each dataset, and the RRA analysis shows that 17 genes are important in these three datasets, and 15 genes are shared genes between the two methods. PPI network suggests that five hub genes are confirmed, they are SPP1, SERPINE1, CAV1, PDGFRB, and PLAU. These five hub genes could predict the OS-related subtype of CRC accurately with AUC equal to 0.771. In our study, we identify two OS-related subtypes, which will provide an innovative insight into colorectal cancer.
Keyphrases
- bioinformatics analysis
- genome wide
- machine learning
- genome wide identification
- oxidative stress
- gene expression
- dna methylation
- protein protein
- genome wide analysis
- end stage renal disease
- ejection fraction
- healthcare
- small molecule
- emergency department
- newly diagnosed
- copy number
- network analysis
- prognostic factors
- transcription factor
- deep learning
- ischemia reperfusion injury
- single cell
- big data
- quality improvement
- single molecule
- dna damage
- diabetic rats
- rna seq