Journal Article
Research on Water Quality Prediction Based on Machine Learning
by
Xudong Chen
, Fengtian Pei
, Minghao Liu
, Zejun Chen
, Keqin Li
and
Jingcheng Xie
Abstract
At present, some urban water plants in China have started using chloramine disinfection. So how to determine whether the disinfected water is drinkable? This article collected a water quality prediction data, including indicators such as chloramine and trihalomethanes. Firstly, descriptive statistics and Pearson correlation analysis were conducted between the data of chloramine
[...] Read more
At present, some urban water plants in China have started using chloramine disinfection. So how to determine whether the disinfected water is drinkable? This article collected a water quality prediction data, including indicators such as chloramine and trihalomethanes. Firstly, descriptive statistics and Pearson correlation analysis were conducted between the data of chloramine and trihalomethanes and the target variable (whether it is drinkable). It is known that water quality cannot be judged solely based on these two indicators, so more indicators such as pH value will be used. In order to establish a more accurate prediction model, the dataset is first preprocessed, including statistical analysis of missing values, determination of box plot outliers, and filling with KNN algorithm. Then, feature engineering is performed, including Yeo Johnson transformation, correlation analysis, and calculation of Shap values. Subsequently, the processed data was input into the established Stacking, Voting, and attention based CNN-LSTM classification prediction models. Random search and cross validation were used to train each model, resulting in the optimal hyperparameters for each model. The relevant evaluation indicators for each model were calculated to measure its accuracy.