The genetic algorithm-aided three-stage ensemble learning method identified a robust survival risk score in patients with glioma.
Sujie ZhuWeikaixin KongJie ZhuLiting HuangShixin WangSuzhen BiZhengwei XiePublished in: Briefings in bioinformatics (2022)
Ensemble learning is a kind of machine learning method which can integrate multiple basic learners together and achieve higher accuracy. Recently, single machine learning methods have been established to predict survival for patients with cancer. However, it still lacked a robust ensemble learning model with high accuracy to pick out patients with high risks. To achieve this, we proposed a novel genetic algorithm-aided three-stage ensemble learning method (3S score) for survival prediction. During the process of constructing the 3S score, double training sets were used to avoid over-fitting; the gene-pairing method was applied to reduce batch effect; a genetic algorithm was employed to select the best basic learner combination. When used to predict the survival state of glioma patients, this model achieved the highest C-index (0.697) as well as area under the receiver operating characteristic curve (ROC-AUCs) (first year = 0.705, third year = 0.825 and fifth year = 0.839) in the combined test set (n = 1191), compared with 12 other baseline models. Furthermore, the 3S score can distinguish survival significantly in eight cohorts among the total of nine independent test cohorts (P < 0.05), achieving significant improvement of ROC-AUCs. Notably, ablation experiments demonstrated that the gene-pairing method, double training sets and genetic algorithm make sure the robustness and effectiveness of the 3S score. The performance exploration on pan-cancer showed that the 3S score has excellent ability on survival prediction in five kinds of cancers, which was verified by Cox regression, survival curves and ROC curves together. To enable its clinical adoption, we implemented the 3S score and other two clinical factors as an easy-to-use web tool for risk scoring and therapy stratification in glioma patients.
Keyphrases
- machine learning
- genome wide
- end stage renal disease
- free survival
- copy number
- deep learning
- newly diagnosed
- ejection fraction
- neural network
- randomized controlled trial
- chronic kidney disease
- artificial intelligence
- dna methylation
- gene expression
- peritoneal dialysis
- squamous cell carcinoma
- stem cells
- convolutional neural network
- climate change
- squamous cell
- virtual reality