Siamese Network-Based Transfer Learning Model to Predict Geogenic Contaminated Groundwaters.
Hailong CaoXianjun XieJian-Bo ShiGui-Bin JiangYanxin WangPublished in: Environmental science & technology (2022)
Exposure to geogenic contaminated groundwaters (GCGs) is a significant public health concern. Machine learning models are powerful tools for the discovery of potential GCGs. However, the insufficient groundwater quality data and the fact that GCGs are typically a minority class in data hinder models to produce meaningful GCG predictions. To address this issue, a deep learning method, Siamese network-based transfer learning (SNTL), is used to estimate the probability that hazardous substances are present in groundwater above a threshold based on limited and class-imbalanced data. SNTL greatly reduces the amount of required training data and eliminates negative effects of class-imbalanced data on prediction model performance. The predictions of three typical GCGs (high arsenic/fluoride/iodine groundwater) show that the SNTL models provide higher (about 80%) and more balanced sensitivity and specificity than benchmark Random Forest models, indicating that SNTL models can predict both GCGs and non-GCGs. Therefore, protecting populations from GCG exposure in areas where other prediction methods fail to contribute risk information due to poor groundwater quality data can be enabled by SNTL.