Automatic Variable Selection Algorithms in Prognostic Factor Research in Neck Pain.
Bernard X W LiewFrancisco M KovacsDavid RügamerJuan-Antonio VargasPublished in: Journal of clinical medicine (2023)
This study aims to compare the variable selection strategies of different machine learning (ML) and statistical algorithms in the prognosis of neck pain (NP) recovery. A total of 3001 participants with NP were included. Three dichotomous outcomes of an improvement in NP, arm pain (AP), and disability at 3 months follow-up were used. Twenty-five variables (twenty-eight parameters) were included as predictors. There were more parameters than variables, as some categorical variables had >2 levels. Eight modelling techniques were compared: stepwise regression based on unadjusted p values (stepP), on adjusted p values (stepPAdj), on Akaike information criterion (stepAIC), best subset regression (BestSubset) least absolute shrinkage and selection operator [LASSO], Minimax concave penalty (MCP), model-based boosting (mboost), and multivariate adaptive regression splines (MuARS). The algorithm that selected the fewest predictors was stepPAdj (number of predictors, p = 4 to 8). MuARS was the algorithm with the second fewest predictors selected ( p = 9 to 14). The predictor selected by all algorithms with the largest coefficient magnitude was "having undergone a neuroreflexotherapy intervention" for NP (β = from 1.987 to 2.296) and AP (β = from 2.639 to 3.554), and "Imaging findings: spinal stenosis" (β = from -1.331 to -1.763) for disability. Stepwise regression based on adjusted p -values resulted in the sparsest models, which enhanced clinical interpretability. MuARS appears to provide the optimal balance between model sparsity whilst retaining high predictive performance across outcomes. Different algorithms produced similar performances but resulted in a different number of variables selected. Rather than relying on any single algorithm, confidence in the variable selection may be increased by using multiple algorithms.
Keyphrases
- machine learning
- deep learning
- artificial intelligence
- big data
- prognostic factors
- multiple sclerosis
- randomized controlled trial
- transcription factor
- chronic pain
- high resolution
- metabolic syndrome
- pain management
- adipose tissue
- type diabetes
- photodynamic therapy
- magnetic resonance imaging
- insulin resistance
- skeletal muscle
- magnetic resonance
- weight loss
- mass spectrometry