Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS.
Alemu Birara ZemariamWondosen AbeyAbdulaziz Kebede KassawAli YimerPublished in: Health informatics journal (2024)
Background : Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods : The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results : 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother's occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion : The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.