Login / Signup

Feature selection approaches for predictive modelling of cadmium sources and pollution levels in water springs.

Fatima K Abu SalemMey JurdiMohamad AlkadriFiras HachemHassan R Dhaini
Published in: Environmental science and pollution research international (2021)
The World Health Organization lists cadmium (Cd) as one of the top ten chemicals of public health concern. Cd is toxic at relatively low exposure levels and has acute and chronic effects on both health and the environment. In this study, we investigate a suite of data-driven methods that could assist decision-makers in estimating Cd levels in water springs, and in identifying polluting sources. Machine learning (ML) regression models were used to identify sources of contamination and predict Cd levels based on support vector machines and a variety of tree-based models, including Random Forests, M5Tree, CatBoost, and gradient boosting. Feature selection analysis revealed that heavy traffic and distance to a major power plant in the sampled area play a leading role in springs Cd contamination, together with precipitation levels and average of slopes of the closest waste dumps upstream to sampled springs. Our best performing ML model was the Adaboost regression tree using all the features (RMSE = 19.36, R^2 = 0.64). Our findings highlight the effectiveness of predictive data-driven modeling in addressing environmental challenges, particularly in high-risk areas with low resources.
Keyphrases