Login / Signup

A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data.

Justin GerolamiJustin Jong Mun WongRicky ZhangTong ChenTashifa ImtiazMiranda SmithTamara JamaspishviliMadhuri KotiJanice Irene GlasgowParvin MousaviNeil RenwickKathrin Tyryshkin
Published in: Diagnostics (Basel, Switzerland) (2022)
Complex high-dimensional datasets that are challenging to analyze are frequently produced through '-omics' profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in '-omics' data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in '-omics' data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies.
Keyphrases