Login / Signup

Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination.

Fuad Al AbirS M ShovanMd Al Mehedi HasanAbu SayeedJungpil Shin
Published in: Molecular omics (2022)
RNA-Seq has made significant contributions to various fields, particularly in cancer research. Recent studies on differential gene expression analysis and the discovery of novel cancer biomarkers have extensively used RNA-Seq data. New biomarker identification is essential for moving cancer research forward, and early cancer diagnosis improves patients' chances of recovery and increases life expectancy. There is an urgency and scope of improvement in both sections. In this paper, we developed an autoencoder-based biomarker identification method by reversing the learning mechanism of the trained encoders. We devised an explainable post hoc methodology for identifying influential genes with a high likelihood of becoming biomarkers. We applied recursive feature elimination to shorten the list further and presented a list of 17 potential biomarkers that are 99.93% accurate in identifying cancer types using support vector machine for the UCI gene expression cancer RNA-Seq dataset consisting of five cancerous tumor types. Our methodology outperforms all of the state-of-the-art methods, confirming the potential of the newly identified biomarkers as well as the efficacy of the biomarker identification procedure. Moreover, we have evaluated the performance of our methodology using six independent RNA-Seq gene expression datasets for several tasks, i.e. , classification of tumors from non-tumors, detecting the origin of circulating tumor cells (CTCs), and predicting if metastasis occurs or not. Our methodology achieved stimulating results for these tasks as well. The source code of this project is available at https://github.com/fuad021/biomarker-identification.
Keyphrases