Determination of bioavailable arsenic threshold and validation of modeled permissible total arsenic in paddy soil using machine learning.
Jajati MandalVinay JainSudip SenguptaMd Aminur RahmanKallol BhattacharyyaMohammad Mahmudur RahmanDebasis GoluiMichael D WoodDebapriya MondalPublished in: Journal of environmental quality (2023)
Minimizing arsenic intake from food consumption is a key aspect of the public health response in arsenic (As)-contaminated regions. In many of these regions, rice is the predominant staple food. Here, we present a validated maximum allowable concentration of total As in paddy soil and provide the first derivation of a maximum allowable soil concentration for bioavailable As. We have previously used meta-analysis to predict the maximum allowable total As in soil based on decision tree (DT) and logistic regression (LR) models. The models were defined using the maximum tolerable concentration (MTC) of As in rice grains as per the codex recommendation. In the present study, we validated these models using three test data sets derived from purposely collected field data. The DT model performed better than the LR in terms of accuracy and Matthews correlation coefficient (MCC). Therefore, the DT estimated maximum allowable total As in paddy soil of 14 mg kg -1 could confidently be used as an appropriate guideline value. We further used the purposely collected field data to predict the concentration of bioavailable As in the paddy soil with the help of random forest (RF), gradient boosting machine (GBM), and LR models. The category of grain As (<MTC and >MTC) was considered as the dependent variable; bioavailable As (BAs), total As (TAs), pH, organic carbon (OC), available phosphorus (AvP), and available iron (AvFe) were the predictor variables. LR performed better than RF and GBM in terms of accuracy, sensitivity, specificity, kappa, precision, log loss, F1score, and MCC. From the better-performing LR model, bioavailable As (BAs), TAs, AvFe, and OC were significant variables for grain As. From the partial dependence plots (PDP) and individual conditional expectation (ICE) of the LR model, 5.70 mg kg -1 was estimated to be the limit for BAs in soil.
Keyphrases
- heavy metals
- drinking water
- public health
- systematic review
- risk assessment
- electronic health record
- climate change
- computed tomography
- body mass index
- immune response
- machine learning
- deep learning
- mass spectrometry
- weight gain
- tandem mass spectrometry
- toll like receptor
- diffusion weighted imaging
- molecularly imprinted