Login / Signup

Comparing the performance of machine learning algorithms for remote and in situ estimations of chlorophyll-a content: A case study in the Tri An Reservoir, Vietnam.

Hao Quang NguyenNam Thang HaLam Nguyen-NgocThanh-Luu Pham
Published in: Water environment research : a research publication of the Water Environment Federation (2021)
Chlorophyll-a (Chl-a) is one of the most important indicators of the trophic status of inland waters, and its continued monitoring is essential. Recently, the operated Sentinel-2 MSI satellite offers high spatial resolution images for remote water quality monitoring. In this study, we tested the performance of the three well-known machine learning (ML) (random forest [RF], support vector machine [SVM], and Gaussian process [GP]) and the two novel ML (extreme gradient boost (XGB) and CatBoost [CB]) models for estimation a wide range of Chl-a concentration (10.1-798.7 μg/L) using the Sentinel-2 MSI data and in situ water quality measurement in the Tri An Reservoir (TAR), Vietnam. GP indicated the most reliable model for predicting Chl-a from water quality parameters (R2  = 0.85, root-mean-square error [RMSE] = 56.65 μg/L, Akaike's information criterion [AIC] = 575.10, and Bayesian information criterion [BIC] = 595.24). Regarding input model as water surface reflectance, CB was the superior model for Chl-a retrieval (R2  = 0.84, RMSE = 46.28 μg/L, AIC = 229.18, and BIC = 238.50). Our results indicated that GP and CB are the two best models for the prediction of Chl-a in TAR. Overall, the Sentinel-2 MSI coupled with ML algorithms is a reliable, inexpensive, and accurate instrument for monitoring Chl-a in inland waters. PRACTITIONER POINTS: Machine learning algorithms were used for both remote sensing data and in situ water quality measurements. The performance of five well-known machine learning models was tested Gaussian process was the most reliable model for predicting Chl-a from water quality parameters CatBoost was the best model for Chl-a retrieval from water surface reflectance.
Keyphrases
  • water quality
  • machine learning
  • deep learning
  • big data
  • artificial intelligence
  • climate change
  • electronic health record
  • healthcare
  • convolutional neural network
  • health information
  • mass spectrometry