Interpreting optimised data-driven solution with explainable artificial intelligence (XAI) for water quality assessment for better decision-making in pollution management.
Javed MallickSaeed AlqadhiHoang Thi HangMajed AlsubihPublished in: Environmental science and pollution research international (2024)
In Saudi Arabia, water pollution and drinking water scarcity pose a major challenge and jeopardise the achievement of sustainable development goals. The urgent need for rapid and accurate monitoring and assessment of water quality requires sophisticated, data-driven solutions for better decision-making in water management. This study aims to develop optimised data-driven models for comprehensive water quality assessment to enable informed decisions that are critical for sustainable water resources management. We used an entropy-weighted arithmetic technique to calculate the Water Quality Index (WQI), which integrates the World Health Organization (WHO) standards for various water quality parameters. Our methodology incorporated advanced machine learning (ML) models, including decision trees, random forests (RF) and correlation analyses to select features essential for identifying critical water quality parameters. We developed and optimised data-driven models such as gradient boosting machines (GBM), deep neural networks (DNN) and RF within the H2O API framework to ensure efficient data processing and handling. Interpretation of these models was achieved through a three-pronged explainable artificial intelligence (XAI) approach: model diagnosis with residual analysis, model parts with permutation-based feature importance and model profiling with partial dependence plots (PDP), accumulated local effects (ALE) plots and individual conditional expectation (ICE) plots. The quantitative results revealed insightful findings: fluoride and residual chlorine had the highest and lowest entropy weights, respectively, indicating their differential effects on water quality. Over 35% of the water samples were categorised as 'unsuitable' for consumption, highlighting the urgency of taking action to improve water quality. Amongst the optimised models, the Random Forest (model 79) and the Deep Neural Network (model 81) proved to be the most effective and showed robust predictive abilities with R 2 values of 0.96 and 0.97 respectively for testing dataset. Model profiling as XAI highlighted the significant influence of key parameters such as nitrate, total hardness and pH on WQI predictions. These findings enable targeted water quality improvement measures that are in line with sustainable water management goals. Therefore, our study demonstrates the potential of advanced, data-driven methods to revolutionise water quality assessment in Saudi Arabia. By providing a more nuanced understanding of water quality dynamics and enabling effective decision-making, these models contribute significantly to the sustainable management of valuable water resources.
Keyphrases
- water quality
- artificial intelligence
- machine learning
- neural network
- drinking water
- big data
- deep learning
- decision making
- quality improvement
- magnetic resonance
- single cell
- climate change
- high resolution
- nitric oxide
- magnetic resonance imaging
- public health
- mass spectrometry
- heavy metals
- network analysis
- health risk
- contrast enhanced