Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine.
JiaRui LiLin LuYu-Hang ZhangYaoChen XuMin LiuKaiYan FengLei ChenXiangYin KongTao HuangYu-Dong CaiPublished in: Cancer gene therapy (2019)
Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.
Keyphrases
- machine learning
- genome wide
- stem cells
- deep learning
- acute myeloid leukemia
- bioinformatics analysis
- bone marrow
- gene expression
- genome wide identification
- monte carlo
- dna methylation
- poor prognosis
- artificial intelligence
- induced apoptosis
- genome wide analysis
- single cell
- chronic kidney disease
- signaling pathway
- squamous cell carcinoma
- cell therapy
- oxidative stress
- newly diagnosed
- big data
- rna seq