Login / Signup

Identification of novel biomarkers to distinguish clear cell and non-clear cell renal cell carcinoma using bioinformatics and machine learning.

Chanita PanwoonWunchana SeubwaiMalinee ThaneeSakkarn Sangkhamanon
Published in: PloS one (2024)
Renal cell carcinoma (RCC), accounting for 90% of all kidney cancer, is categorized into clear cell RCC (ccRCC) and non-clear cell RCC (non-ccRCC) for treatment based on the current NCCN Guidelines. Thus, the classification will be associated with therapeutic implications. This study aims to identify novel biomarkers to differentiate ccRCC from non-ccRCC using bioinformatics and machine learning. The gene expression profiles of ccRCC and non-ccRCC subtypes (including papillary RCC (pRCC) and chromophobe RCC (chRCC)), were obtained from TCGA. Differential expression genes (DEGs) were identified, and specific DEGs for ccRCC and non-ccRCC were explored using a Venn diagram. Gene Ontology and pathway enrichment analysis were performed using DAVID. The top ten expressed genes in ccRCC were then selected for machine learning analysis. Feature selection was operated to identify a minimum highly effective gene set for constructing a predictive model. The expression of best-performing gene set was validated on tissue samples from RCC patients using immunohistochemistry techniques. Subsequently, machine learning models for diagnosing RCC were developed using H-scores. There were 910, 415, and 835 genes significantly specific for DEGs in ccRCC, pRCC, and chRCC, respectively. Specific DEGs in ccRCC enriched in PD-1 signaling, immune system, and cytokine signaling in the immune system, whereas TCA cycle and respiratory, signaling by insulin receptor, and metabolism were enriched in chRCC. Feature selection based on Decision Tree Classifier revealed that the model with two genes, including NDUFA4L2 and DAT, had an accuracy of 98.89%. Supervised classification models based on H-score of NDUFA4L2, and DAT revealed that Decision Tree models showed the best performance with 82% accuracy and 0.9 AUC. NDUFA4L2 expression was associated with lymphovascular invasion, pathologic stage and pT stage in ccRCC. Using integrated bioinformatics and machine learning analysis, NDUFA4L2 and DAT were identified as novel biomarkers to differential diagnosis ccRCC from non-ccRCC.
Keyphrases