Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study.

Malik BraikMohammed A AwadallahMohammed Azmi Al-BetarAbdelaziz I HammouriOmar A Alzubi

Published in: Cognitive computation (2023)

Feature selection (FS) is a crucial area of cognitive computation that demands further studies. It has recently received a lot of attention from researchers working in machine learning and data mining. It is broadly employed in many different applications. Many enhanced strategies have been created for FS methods in cognitive computation to boost the performance of the methods. The goal of this paper is to present three adaptive versions of the capuchin search algorithm (CSA) that each features a better search ability than the parent CSA. These versions are used to select optimal feature subset based on a binary version of each adapted one and the k-Nearest Neighbor (k-NN) classifier. These versions were matured by applying several strategies, including automated control of inertia weight, acceleration coefficients, and other computational factors, to ameliorate search potency and convergence speed of CSA. In the velocity model of CSA, some growth computational functions, known as exponential, power, and S-shaped functions, were adopted to evolve three versions of CSA, referred to as exponential CSA (ECSA), power CSA (PCSA), and S-shaped CSA (SCSA), respectively. The results of the proposed FS methods on 24 benchmark datasets with different dimensions from various repositories were compared with other k-NN based FS methods from the literature. The results revealed that the proposed methods significantly outperformed the performance of CSA and other well-established FS methods in several relevant criteria. In particular, among the 24 datasets considered, the proposed binary ECSA, which yielded the best overall results among all other proposed versions, is able to excel the others in 18 datasets in terms of classification accuracy, 13 datasets in terms of specificity, 10 datasets in terms of sensitivity, and 14 datasets in terms of fitness values. Simply put, the results on 15, 9, and 5 datasets out of the 24 datasets studied showed that the performance levels of the binary ECSA, PCSA, and SCSA are over 90% in respect of specificity, sensitivity, and accuracy measures, respectively. The thorough results via different comparisons divulge the efficiency of the proposed methods in widening the classification accuracy compared to other methods, ensuring the ability of the proposed methods in exploring the feature space and selecting the most useful features for classification studies.

Keyphrases