Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis.
Yang ZhouYang GuoYuanhe WangPublished in: IET systems biology (2022)
Colon cancer (CC) is one of the most commonly diagnosed tumours worldwide. Single-cell RNA sequencing (scRNA-seq) can accurately reflect the heterogeneity within and between tumour cells and identify important genes associated with cancer development and growth. In this study, scRNA-seq was used to identify reliable prognostic biomarkers in CC. ScRNA-seq data of CC before and after 5-fluorouracil treatment were first downloaded from the Gene Expression Omnibus database. The data were pre-processed, and dimensionality reduction was performed using principal component analysis and t-distributed stochastic neighbour embedding algorithms. Additionally, the transcriptome data, somatic variant data, and clinical reports of patients with CC were obtained from The Cancer Genome Atlas database. Seven key genes were identified using Cox regression analysis and the least absolute shrinkage and selection operator method to establish signatures associated with CC prognoses. The identified signatures were validated on independent datasets, and somatic mutations and potential oncogenic pathways were further explored. Based on these features, gene signatures, and other clinical variables, a more effective predictive model nomogram for patients with CC was constructed, and a decision curve analysis was performed to assess the utility of the nomogram. A prognostic signature consisting of seven prognostic-related genes, including CAV2, EREG, NGFRAP1, WBSCR22, SPINT2, CCDC28A, and BCL10, was constructed and validated. The proficiency and credibility of the signature were verified in both internal and external datasets, and the results showed that the seven-gene signature could effectively predict the prognosis of patients with CC under various clinical conditions. A nomogram was then constructed based on features such as the RiskScore, patients' age, neoplasm stage, and tumor (T), nodes (N), and metastases (M) classification, and the nomogram had good clinical utility. Higher RiskScores were associated with a higher tumour mutational burden, which was confirmed to be a prognostic risk factor. Gene set enrichment analysis showed that high-score groups were enriched in 'cytoplasmic DNA sensing', 'Extracellular matrix receptor interactions', and 'focal adhesion', and low-score groups were enriched in 'natural killer cell-mediated cytotoxicity', and 'T-cell receptor signalling pathways', among other pathways. A robust seven-gene marker for CC was identified based on scRNA-seq data and was validated in multiple independent cohort studies. These findings provide a new potential marker to predict the prognosis of patients with CC.
Keyphrases
- single cell
- genome wide
- rna seq
- copy number
- dna methylation
- electronic health record
- high throughput
- gene expression
- big data
- genome wide identification
- extracellular matrix
- lymph node metastasis
- papillary thyroid
- machine learning
- wastewater treatment
- risk factors
- deep learning
- end stage renal disease
- stem cells
- radiation therapy
- single molecule
- adverse drug
- cell death
- data analysis
- induced apoptosis
- bioinformatics analysis
- pseudomonas aeruginosa
- transcription factor
- oxidative stress
- signaling pathway
- climate change
- lymph node
- mesenchymal stem cells
- cell migration
- cell cycle arrest
- endoplasmic reticulum stress
- low grade