Neural network and cubist algorithms to predict fecal coliform content in treated wastewater by multi-soil-layering system for potential reuse.
Sofyan SbahiNaaila OuazzaniAbdessamed HejjajLaila MandiPublished in: Journal of environmental quality (2020)
This study aims to find the most accurate machine learning algorithms as compared to linear regression for prediction of fecal coliform (FC) concentration in the effluent of a multi-soil-layering (MSL) system and to identify the input variables affecting FC removal from domestic wastewater. The effluent quality of two different designs of the MSL system was evaluated and compared for several parameters for potential reuse in agriculture. The first system consisted of a single-stage MSL (MSL-SS), and the second system consisted of a two-stage MSL (MSL-TS). The concentration of FC in the effluent of the MSL-TS system was estimated by three machine learning algorithms: artificial neural network (ANN), Cubist, and multiple linear regression (MLR). The accuracy of the models was measured by comparing the real and predicted values. Significant (p < .001) improvements were noted for the removal of pollutants by the MSL-TS system compared with the MSL-SS system. Overall, the water quality parameters investigated complied with FAO irrigation standards. The predictive performance of the models has been compared and evaluated using several metrics. The results revealed that the ANN model yielded a superior predictive performance (R2 = .953), followed by the Cubist model (R2 = .946) and the MLR technique (R2 = .481). Based on the accurate model (ANN), the degree of influence of each predictor was investigated, and the results show that total suspended solids and pH have proved to be more useful for predicting FC concentrations.