An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction.
Richard WijayaFaisal SaeedParnia SamimiAbdullah M AlbarrakSultan Noman QasemPublished in: Bioengineering (Basel, Switzerland) (2024)
Stroke poses a significant health threat, affecting millions annually. Early and precise prediction is crucial to providing effective preventive healthcare interventions. This study applied an ensemble machine learning and data mining approach to enhance the effectiveness of stroke prediction. By employing the cross-industry standard process for data mining (CRISP-DM) methodology, various techniques, including random forest, ExtraTrees, XGBoost, artificial neural network (ANN), and genetic algorithm with ANN (GANN) were applied on two benchmark datasets to predict stroke based on several parameters, such as gender, age, various diseases, smoking status, BMI, HighCol, physical activity, hypertension, heart disease, lifestyle, and others. Due to dataset imbalance, Synthetic Minority Oversampling Technique (SMOTE) was applied to the datasets. Hyperparameter tuning optimized the models via grid search and randomized search cross-validation. The evaluation metrics included accuracy, precision, recall, F1-score, and area under the curve (AUC). The experimental results show that the ensemble ExtraTrees classifier achieved the highest accuracy (98.24%) and AUC (98.24%). Random forest also performed well, achieving 98.03% in both accuracy and AUC. Comparisons with state-of-the-art stroke prediction methods revealed that the proposed approach demonstrates superior performance, indicating its potential as a promising method for stroke prediction and offering substantial benefits to healthcare.
Keyphrases
- neural network
- atrial fibrillation
- machine learning
- healthcare
- physical activity
- big data
- electronic health record
- blood pressure
- randomized controlled trial
- artificial intelligence
- pulmonary hypertension
- systematic review
- convolutional neural network
- deep learning
- type diabetes
- rna seq
- single cell
- open label
- gene expression
- insulin resistance
- health insurance
- data analysis
- phase iii