Prediction of Geosmin at Different Depths of Lake Using Machine Learning Techniques.
Yong-Su KwonIn-Hwan ChoHa-Kyung KimJeong-Hwan ByunMi-Jung BaeBaik-Ho KimPublished in: International journal of environmental research and public health (2021)
Geosmin is a major concern in the management of water sources worldwide. Thus, we predicted concentration categories of geosmin at three different depths of lakes (i.e., surface, middle, and bottom), and analyzed relationships between geosmin concentration and factors such as phytoplankton abundance and environmental variables. Data were collected monthly from three major lakes (Uiam, Cheongpyeong, and Paldang lakes) in Korea from May 2014 to December 2015. Before predicting geosmin concentration, we categorized it into four groups based on the boxplot method, and multivariate adaptive regression splines, classification and regression trees, and random forest (RF) were applied to identify the most appropriate modelling to predict geosmin concentration. Overall, using environmental variables was more accurate than using phytoplankton abundance to predict the four categories of geosmin concentration based on AUC and accuracy in all three models as well as each layer. The RF model had the highest predictive power among the three SDMs. When predicting geosmin in the three water layers, the relative importance of environmental variables and phytoplankton abundance in the sensitivity analysis was different for each layer. Water temperature and abundance of Cyanophyceae were the most important factors for predicting geosmin concentration categories in the surface layer, whereas total abundance of phytoplankton exhibited relatively higher importance in the bottom layer.