Decomposition feature selection with applications in detecting correlated biomarkers of bipolar disorders.
Hailin HuangYuanzhang LiHaili LangColin O WuPublished in: Statistics in medicine (2019)
Feature selection is an important initial step of exploratory analysis in biomedical studies. Its main objective is to eliminate the covariates that are uncorrelated with the outcome. For highly correlated covariates, traditional feature selection methods, such as the Lasso, tend to select one of them and eliminate the others, although some of the eliminated ones are still scientifically valuable. To alleviate this drawback, we propose a feature selection method based on covariate space decomposition, referred herein as the "Decomposition Feature Selection" (DFS), and show that this method can lead to scientifically meaningful results in studies with correlated high dimensional data. The DFS consists of two steps: (i) decomposing the covariate space into disjoint subsets such that each of the subsets contains only uncorrelated covariates and (ii) identifying significant predictors by traditional feature selection within each covariate subset. We demonstrate through simulation studies that the DFS has superior practical performance over the Lasso type methods when multiple highly correlated covariates need to be retained. Application of the DFS is demonstrated through a study of bipolar disorders with correlated biomarkers.