Biobjective gradient descent for feature selection on high dimension, low sample size data.
Tina IssaEric AngelFarida ZehraouiPublished in: PloS one (2024)
Even though deep learning shows impressive results in several applications, its use on problems with High Dimensions and Low Sample Size, such as diagnosing rare diseases, leads to overfitting. One solution often proposed is feature selection. In deep learning, along with feature selection, network sparsification is also used to improve the results when dealing with high dimensions low sample size data. However, most of the time, they are tackled as separate problems. This paper proposes a new approach that integrates feature selection, based on sparsification, into the training process of a deep neural network. This approach uses a constrained biobjective gradient descent method. It provides a set of Pareto optimal neural networks that make a trade-off between network sparsity and model accuracy. Results on both artificial and real datasets show that using a constrained biobjective gradient descent increases the network sparsity without degrading the classification performances. With the proposed approach, on an artificial dataset, the feature selection score reached 0.97 with a sparsity score of 0.92 with an accuracy of 0.9. For the same accuracy, none of the other methods reached a feature score above 0.20 and sparsity score of 0.35. Finally, statistical tests validate the results obtained on all datasets.