Classification of early-stage non-small cell lung cancer by weighing gene expression profiles with connectivity information.
Ao ZhangSuyan TianPublished in: Biometrical journal. Biometrische Zeitschrift (2017)
Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible.
Keyphrases
- machine learning
- genome wide
- genome wide identification
- deep learning
- copy number
- early stage
- health information
- big data
- resting state
- genome wide analysis
- white matter
- magnetic resonance
- artificial intelligence
- functional connectivity
- magnetic resonance imaging
- dna methylation
- squamous cell carcinoma
- electronic health record
- lymph node
- radiation therapy
- contrast enhanced
- computed tomography
- data analysis
- sentinel lymph node