Virtual measurements in Wastewater treatment plant: Machine learning models for predicting the PO4 concentration in the effluent
MetadataShow full item record
Due to the underlying complexity of wastewater treatment plant (WWTP) processes, it might be challenging to respond appropriately and promptly to the dynamic process conditions in order to ensure the quality of the effluent, particularly when operational cost are a major consideration. In order to avoid various limitations of conventional mechanistic models, machine learning (ML) methods have been utilized to model WWTP processes. Additionally, the time lags between process steps have been neglected, making it difficult to explain the relationships between operational factors and effluent quality. Therefore, in this study multiple machine learning methods were developed to improve effluent quality control in WWTPs by clarifying the relationships between operational parameters and effluent parameters. To be more specific, the objective in this study is to predict the concentration of phosphate (PO4) in the effluent of HIAS wastewater treatment plant. In this study, machine learning algorithms for effluent quality control in WTTPs is proposed. The various ML algorithms consist of Regression models (Linear Regression, Lasso Regression, and Ridge Regression), data-driven multi-class classification models (Decision Trees, Gradient Boosting Decision Tree, and XGBoost), and Long Short-Term Memory (LSTM) model specifically designed for time-series data analysis. The dataset utilized in this study, contains time-series data, historical operational variables and effluent parameters from HIAS wastewater treatment plant in Hamar, involving decent size of samples (8662). One effluent parameter, Phosphate in effluent (PO4), and 19 operational parameters are studied. The data preprocessing method used to prepare it for ML models, includes handling missing values and outliers to ensure reliable and consistent analysis. The ML models are trained, validated, and evaluated using appropriate evaluation metrics, such as R-squared Mean Error (RSME), Mean Absolute Error, and Mean Squared Error (MSE) to assess the performance and effectiveness of the machine learning models. The results demonstrated the effectiveness of the ML models in improving effluent quality control. Among the regression model, Linear and Ridge regression performed best, achieving a moderate fit with an R2 score 0.527. Lasso Regression demonstrated very poor and weak performance . In terms of data-driven muti-class classification, Gradient Boosting Decision Tree model outperformed the other classification models with an R2 score of 0.869, indicating a good fit. The LSTM model displayed significant promise in accurately predict the PO4 concentration in the effluent among the ML models utilized in this study , achieving an substantial fit with an R2 score of 0.926. These results could support the development of more advanced control strategies to increase the impact on PO4 removal.