Skip to main content
Environmental Science and Ecotechnology logoLink to Environmental Science and Ecotechnology
. 2022 Sep 24;13:100207. doi: 10.1016/j.ese.2022.100207

Deep learning-based prediction of effluent quality of a constructed wetland

Bowen Yang a, Zijie Xiao a, Qingjie Meng b, Yuan Yuan c, Wenqian Wang a, Haoyu Wang d, Yongmei Wang a, Xiaochi Feng a,
PMCID: PMC9529666  PMID: 36203649

Abstract

Data-driven approaches that make timely predictions about pollutant concentrations in the effluent of constructed wetlands are essential for improving the treatment performance of constructed wetlands. However, the effect of the meteorological condition and flow changes in a real scenario are generally neglected in water quality prediction. To address this problem, in this study, we propose an approach based on multi-source data fusion that considers the following indicators: water quality indicators, water quantity indicators, and meteorological indicators. In this study, we establish four representative methods to simultaneously predict the concentrations of three representative pollutants in the effluent of a practical large-scale constructed wetland: (1) multiple linear regression; (2) backpropagation neural network (BPNN); (3) genetic algorithm combined with the BPNN to solve the local minima problem; and (4) long short-term memory (LSTM) neural network to consider the influence of past results on the present. The results suggest that the LSTM-predicting model performed considerably better than the other deep neural network-based model or linear method, with a satisfactory R2. Additionally, given the huge fluctuation of different pollutant concentrations in the effluent, we used a moving average method to smooth the original data, which successfully improved the accuracy of traditional neural networks and hybrid neural networks. The results of this study indicate that the hybrid modeling concept that combines intelligent and scientific data preprocessing methods with deep learning algorithms is a feasible approach for forecasting water quality in the effluent of actual engineering.

Keywords: LSTM, Constructed wetlands, Water quality prediction, Deep learning, Multi-source data fusion

Graphical abstract

Image 1

Highlights

  • Four prediction models are successfully established through multisource data fusion.

  • The LSTM shows a satisfactory forecast accuracy.

  • The moving average method can obviously improve the accuracy of the GA-BPNN.

1. Introduction

Compared with wastewater treatment plants, constructed wetlands (CWs) are widely applied in developing countries to deeply purify urban water pollution because of their low construction and operation costs, excellent treatment capacity and high ecological benefits [1,2]. Additionally, in the context of global warming, new requirement has been presented for wastewater treatment, that is, the reduction of greenhouse gas (GHG) emissions [3,4]. In this case, CWs are widely used as a low-carbon and green sewage treatment method to address various point and non-point source pollution [5]. To maximize the treatment efficiency of CWs, it is necessary to make timely predictions about the potential changes in effluent and adjust the operation parameters of CWs to guarantee the safety of urban water systems [6]. Therefore, based on the optimization of previous effluent quality data from a CW, establishing a satisfactory model to predict sudden future changes will provide an effective strategy for the regulation of CWs, thereby indirectly providing an approach to control urban water pollution [[7], [8], [9]].

Mathematical models have been used frequently to not only simulate CW purification mechanisms but also predict effluent quality [10,11]. However, to predict the effluent quality of CWs based on mathematical models, it is not only necessary to continuously monitor a series of key water quality indicators (biochemical oxygen demand in five days (BOD5), chemical oxygen demand (COD), ammonia nitrogen (NH4+-N), and total phosphorus (TP)) but also to measure the absorption of wetland plants and the activity of bacteria, which consumes a large amount of time and energy [12,13]. For example [14], established a physical-mathematical water quality model to simulate the interaction between overland and subsurface flow that occurs in horizontal flow CWs. The process not only required a series of specific formulas to simulate biochemical processes but also needed to establish a water hydraulic model, which was extremely tedious. Therefore, time-consuming sampling and measurement was a major obstacle in water quality perception and the timely adjustment of CWs.

Meanwhile, various data-driven models have been used to predict the purification capacity of CWs [15]. Although a model requires a number of data points as an mechanistic or mathematical model, the data-driven method does not require detailed fundamental and mechanistic knowledge. Therefore, data-driven models have the potential for wider application and achieve better prediction performance in terms of forecasting the water quality of practically CWs than mathematical models [16,17].

Among the diverse data-driven methods, deep learning has become a widely used technology in hydrological time series prediction because of its strong nonlinear mapping and prediction capabilities, higher error tolerance and better generalizability [18]. For example [19], optimized energy consumption and effluent quality during wastewater treatment using novel dynamic optimization control based on multi-objective ant lion optimization and deep learning algorithms [20]. applied an artificial neural network (ANN) to simulate the denitrification rate of CWs and concluded that the ANN achieved a much better simulation effect than the traditional multiple linear regression (MLR) model or simplified mechanistic model because of its excellent regression capabilities for nonlinear problems [21]. used a genetic algorithm (GA) combined with an ANN model to simulate and predict paper-making wastewater treatment. The results demonstrated that, through its excellent global searchability, the GA can substantially reduce the BPNN's error and improve accuracy, which makes it a powerful tool for predicting complex problems [22]. Additionally [23], used a long short-term memory (LSTM) model combined with the wavelet domain threshold denoising method to predict historical changes in chlorophyll A in lake water and predict future concentration changes. Furthermore [24], proposed an integrated empirical mode decomposition (EMD)-LSTM model to predict water quality in urban drainage networks, which combined an EMD-centric data preprocessing module and LSTM neural network prediction module to improve the model-based accuracy of the detection method. These results demonstrated that LSTM performed well in multi-time-step prediction problems.

To date, the large-scale application of deep learning methods for predicting effluent quality in real vertical flow CWs has not been investigated systematically. Previous applications have either been in small-scale CWs in the laboratory or mostly focused on predicting the concentration of specific pollutants based on several accessible parameters, such as temperature, flow rate, and dissolved oxygen [[25], [26], [27], [28], [29]]. However, considering that the water influent concentration of CWs under actual conditions is highly volatile and that a large number of parameters affect the processing capacity of CWs in large-scale applications, such as temperature, rainfall, atmospheric pressure, and humidity, it remains a challenge to establish a suitable method to predict multiple pollutants simultaneously with the help of multi-source data.

Therefore, our purpose in this study is to simulate and predict the effluent quality of large-scale CWs in time through a combination of deep learning algorithms and multi-source data-driven methods. First, given the multi-source data that affect the processing capacity of CWs, we investigate the mapping relationship between the data of the previous day and the concentration of pollutants in the CW effluent of the next day. Then, we develop various typical approaches for predicting the concentrations of three conventional pollutants and compare them with each other so that we can identify the best model for this complex environment at large spatial scales. Finally, because of the high volatility of the effluent concentration of CWs, we propose a data preprocessing module that can smooth the original data, remove high-frequency noise, and effectively increase model prediction accuracy. Our research provides new methods and ideas for improving the prediction accuracy of the large-scale application of water quality models in practical scenarios.

2. Materials and methods

2.1. Preprocessing of raw data

In this study, we divide data preprocessing methods into two parts: moving average and normalization. The moving average is a data smoothing method that is capable of smoothing high-frequency noise, and making the pattern more visible than original is required to ensure the stability of model performance [30]. The smoothing formula is shown in Equation (1). Because of the difference in dimensions between the indicators, some indicators are ignored in the modeling process, and the original variables are normalized through a linear transformation of the raw data (Zhou 2020). For example, if there are i indicators, v1, v2, …, vi, that represent the attributes of j objects, then the raw dataset is as shown in Equation (2). “Min” and “max” are the minimum and maximum values of an index, respectively. These values map the original value vij of an index to the value v'ij in the interval [0, 1] through min-max normalization, as shown in Equation (3):

Yt=Xt+Xt1+Xt2+...+Xtnn (1)

where Xt is the effluent concentration on day t, Yt is the effluent concentration on day t after averaging, and n is the average number of days;

Vi×j=(V11V1jVi1Vij), (2)

where i represents the number of indicators and j represents the number of attributes of each indicator; and

Vm,=Vmmin(Vm)max(Vm)min(Vm) (3)

where V'm represents the normalized value, and max(Vm) and min(Vm) are the maximum and minimum values of the sample, respectively.

2.2. Prediction models

2.2.1. Multiple linear regression (MLR)

In regression analysis, if more than one independent variable (input variables xj) are used to predict dependent variables (output variable Y) through linear regression, this is called MLR [31], which can be expressed as follows:

Y=k1x1+k2x2+k3x3+......+kjxj+k0 (4)

where k1, k2, …, kn are the regression coefficients and k0 is the intercept of MLR. The coefficients of each variable reflect its effect on the predictive results.

Multicollinearity is a common problem in MLR. When there is strong collinearity between variables, the prediction performance of the model decreases. Therefore, it is necessary to calculate the variance inflation factor (VIF) value between the variables. The VIF value of each independent variable is calculated as

VIF=11Rk2 (5)

where Rk is the negative correlation coefficient of the independent variable xk for the regression analysis of the remaining independent variables. The larger the VIF, the greater the possibility of collinearity among independent variables. Therefore, it is critical to guarantee that variables with high VIF (VIF >5) are eliminated to ensure that the variables are independent of each other in the final model [32].

2.2.2. Backpropagation neural network (BPNN)

As shown in Fig. 1a, the BPNN is a neural network with a large number of neurons. All neurons in each layer are directly connected to the neurons in the next layer; hence, the BPNN can also be called a fully connected neural network. The BPNN contains an input layer, output layer, and series of intermediate or hidden layers. Each layer of neurons contains one or more neurons. The weights and biases of the BPNN are updated according to the gradient drop during training. Each part of BPNN is divided into several connection neuron layers [33]. The value of each neuron is

Y=f(i=1nXiWij+bj) (6)

where Xi is the input variable, n is the number of neurons in the current layer, Wij is the weight of the connection between the neuron and the next layer of neurons, bj is the bias of the neuron, ∗ represents the scalar product of two vectors, and f is the activation function. The neurons in the previous layer are all connected to each neuron in the current layer. A sigmoid function is a commonly used activation function that has an output value between 0 and 1. The specific formula is as follows:

sigmod(x)=11+ex. (7)
Fig. 1.

Fig. 1

Structure of the deep learning neural network model. a, Back Propagation Neural Network (BPNN). b, Genetic Algorithm (GA). c, Long Short Term Memory (LSTM) network.

Backpropagation is a widely used training algorithm. Simultaneously, the BPNN is the most basic neural network model. Its output is propagated forward and the error is propagated backward. With the help of the returned error, the weights and biases can be updated, which finally achieves the purpose of optimizing the model. For the backpropagation of errors, the gradient descent method is generally used to update the weights. The first-order and second-order partial derivatives of all function variables of the error function are computed to obtain the gradient descent direction and speed of the function to determine the fastest descent direction, and correct the weights and thresholds of the network.

2.2.3. Genetic algorithm-backpropagation neural network (GA-BPNN)

In this study, we adopt a GA as an optimization method to adjust the weights and biases of the initial BPNN. A GA is the process of imitating biological evolution to select the most suitable results among all possible solutions. The optimization process mainly includes obtaining a large amount through selection, crossover, and mutation, in addition to selecting individuals with the best fitness, which is shown in Fig. 1b.

Selection: The selection process is based on the fitness evaluation of individuals in the group: the fitter the individuals, the more offspring they produce, as shown in Equation (8).

Crossover: Crossover is the process of recombining two separate chromosomes to create a new individual. The calculation process is shown in Equation (9).

Mutation: The mutation operation randomly changes some of the values on the chromosome to create new individuals. Its calculation is shown in Equations (10), (11):

Pi=fij=1nfj (8)
aij={akj(1b)+aijbaij(1b)+akjb (9)
aij={aij+(aijamax)f(g)r>0.5aij+(aminaij)f(g)r0.5 (10)
f(g)=r2(1GGmax)2 (11)

where Pi is the selection probability of individual i, fi is the fitness of individual i, and n is the number of individuals in the population. aij is the jth gene of the ith individual, akj is the jth gene of the kth individual, and amin and amax are the upper and lower bounds of the gene, respectively. G is the current iteration number, Gmax is the maximum generation number, and r is a random number in the interval [0,1].

The optimization process consists of encoding and decoding the input, creating the initial population, calculating fitness, iterative operations, and adjusting the parameters. After the first generation is obtained, the most suitable individuals are selected from each generation according to the fitness result, and then a new generation is obtained using iterative operations until the set number of generations is reached. Therefore, the GA-BPNN is a method that first uses a GA to optimize the weights and biases that need to be set in advance for the BPNN, and then uses the most suitable coefficients set in advance to complete the training and testing of the BPNN.

2.2.4. Long short-term memory (LSTM)

The data flow of LSTM is similar to that of other recurrent neural networks (RNN) in that the data flow passes through each neuron using backforward propagation during training. The structural difference between LSTM and other RNNs is the difference in the results and functions of its neurons, which makes it an excellent solution to the problems of vanishing and exploding gradients [34], as shown in Fig. 1c.

The core aspects of the LSTM neural network are its storage cell form and gate structure. The memory cell is a way of disseminating previous data and can be considered as the memory of the network. The gate structure can be roughly divided into three types of gates: input gates, output gates, and forget gates. Each of these gates and memory cells are described in detail as follows:

Input Gate (I): The information input from the input layer at each moment first passes through the input gate, and the switch of the input gate determines whether the information is input into the memory cell at this moment, as shown in Equation (12).

Output Gate (O): The information output from the memory cell at each moment is determined by this gate, and its calculation is shown in Equations (13), (14).

Forget Gate (F): Every time the value in the memory cell will undergo a process of choosing whether to be forgotten or not by the gate. If the data are marked, the value in the memory cell is cleared, that is, forgotten. The calculation process is shown in Equation (15).

Memory Cell (M): The information in the memory cell depends on the input at the previous moment and the forget gate. Additionally, at this moment, the information is input into the training process through the output gate. Its calculation is shown in Equation (16):

It=f(XtWi+Ht1Wih+Mt1Wim+bi) (12)
Ot=f(XtWo+Ht1Woh+Mt1Wom+bo) (13)
Ht=Ottanh(Mt1) (14)
Ft=f(XtWf+Ht1Wfh+Mt1Wfm+bf) (15)
Mt=FtMt1+Ittanh(XtWm+Ht1Wmh+bm), (16)

where Xt represents the input variables; f is the activation function – in this model, we choose the sigmoid function (as shown in Equation (7)); Wf, Wi, Wm, and Wo are the weights of Xt in the forget gate, input gate, memory cell state, and output gate, respectively; Wfh, Wih, Wmh, and Woh are the weights of Ht-1 at the forget gate, input gate, memory cell state, and output gate, respectively; Wfm, Wim, and Wom are weights related to the connection between the memory cell state and different structures; bf, bi, bc, and bo are the biases in the each structure, respectively; and ∗ represents the scalar product of two vectors. (The other variables not given were defined in previous equations.)

The backpropagation algorithm is used throughout the training process of the LSTM, and the associated variable matrix is continuously optimized to finally determine the optimal set of variables. The problems of exploding and vanishing gradients during training and learning are easily solved by LSTM [35].

2.3. Model performance evaluation

In this study, we use two performance evaluation metrics: relative root-mean-square error (RMSE) and coefficient of determination (R2). The RMSE measures the deviation between observations and true values; the formula is shown in Equation (17). R2 is generally used in regression models to evaluate the conformity between the predicted and actual values, which is calculated as shown in Equation (18):

RMSE=1nt=1n(ytactytpre)2 (17)

where y¯tactis the actual value and ytpreis the predictive value; and

R2=1t=1n(ytactytpre)2t=1n(ytacty¯tact)2 (18)

where ytactrepresents the actual value, ytpre represents the predictive value, and y¯tact represents the average of the actual data values.

2.4. Description of the experimental data

The set of plant data used in this study originated from a CW located in a city in southern China, with a total construction area of 42,500 m2 (31,000 m2 is a vertical flow CW) (as shown in Fig. 2a). It undertakes 20,000 m2 tail water from the first phase of the upstream Longhua Wastewater Treatment Plant every day.

Fig. 2.

Fig. 2

Diagram and model description of the constructed wetland. a, Satellite photo. b, Prediction model.

We took sampling points at 10:00 in the morning every day. The dataset included the following environmental indicators: meteorological indicators (temperature, relative humidity, and rainfall), water quantity indicators (flow velocity), water quality indicators (NH4+-N'inf, TPinf, CODinf, SSinf, PH, BOD5-inf, NH4+-N'eff, TPeff, and CODeff). We surveyed and collected meteorological indicators at sampling points from the local meteorological bureau, whereas water quality indicators and water quantity indicators sampled and collected from sampling points. The cumulative number of days for data collection was 186 days (from January 28, 2021 to August 31, 2021) However, some raw data exhibited diverse and irregular patterns, which implied that data-driven modeling would fail to achieve great model performance. The structure of our model is shown in Fig. 2b.

We performed moving average processing on each water effluent indicator using Equation (1). Therefore, three moving average indicators plus 13 environmental indicators provided a total of 16 indicators, a total of 2960 indicators. Table 1 illustrates the average, standard deviation, minimum, and maximum values of the 16 indicators.

Table 1.

Summary statistics for the 16 variables.

variables Indicators Max value Min value Average value Standard deviation
v1 temperature 32.8 14.4 25.99 4.72
v2 Relative humidity 100 30 64.58 11.98
v3 rainfall 84.8 0 4.0135 11.82
v4 flow velocity 35,442 10,576 17,372.816 4211.12
v5 NH4+ –N'inf 0.9 0.009 0.1585 0.1456
v6 TPinf 0.83 0.004 0.09427 0.073
v7 CODinf 25.31 0.053 15.14 3.277
v8 SSinf 7 1 3.357 0.88
v9 PH 7.98 5.61 7.21 0.327
v10 BOD5-inf 5.6 0.8 3.0196 0.637
v11 NH4+ –N'eff 0.546 0.006 0.121 0.104
v12 TPeff 0.325 0.012 0.0889 0.039
v13 CODeff 22 0.076 13.935 3.14
v14 NH4+ –N'eff(ma) 0.351 0.033 0,1211 0.0694
v15 TPeff(ma) 0.213 0.0253 0.089 0.0257
v16 CODeff(ma) 17.393 6.58 13.903 1.943

We divided the dataset into two subsets, that is, the training set (January 29, 2021 to July 13, 2021, 166 days) and testing set (July 14, 2021 to August 31, 2021, 19 days), which corresponded to a total of 90% and 10% data for training and testing, respectively. We mainly used the training set to train the parameters in the neural network, which is associated with input-output models. We used the testing set to verify the performance of the model. After training on the training set, we compared and assessed the performance of each model using the testing set.

2.5. Computing environment

We implemented the MLR model using SPSS 22.0 software. We implemented the BPNN, GA-BPNN, and LSTM models in MATLAB 2020b using the Neural Network Toolbox, Genetic Algorithm Optimization Toolbox, and Deep Learning Toolbox.

3. Results and discussion

3.1. Raw data analysis

Through continuous monitoring of the influent and effluent, we analyzed the basic variation rules of water quality in the large-scale CW. Fig. 3 shows the concentration of TP, COD, and NH4+-N, and the removal efficiency for each indicator (Text S1). It is obvious that, in most cases, CWs had a certain removal effect on pollutants; however, there were still cases in which there was no removal effect. There may be three main reasons for these results: (1) The concentration of pollutants in the influent water was too low, which led to the description of the substances in the original soil of the wetland and induced the increase of pollutant concentration in the wetland. For example, the concentration of TP and NH4+-N in the water was too low, which resulted in the low removal rate of wetlands on the 145th to 180th days. (2) The COD:TP ratio in the tail water of the sewage treatment plant was too low. For example, the COD:TP ratio was significantly lower than 100:1 around the 5th and 40th days, which resulted in an insufficient carbon source, which was not conducive to the removal of phosphorus in water. (3) The pollutant removal efficiencies of CWs are greatly affected by external conditions, such as temperature and rainfall. During strong rainfall, the concentrations of pollutants in water are affected. For these reasons, the effluent quality of the CW in the actual environment was similar to that in the specific environment in the laboratory; that is, it was generally still lower than the discharge standard. However, the complexity of the data and model characteristics in the actual environment was much higher than that in the laboratory during the construction of the data-driven model.

Fig. 3.

Fig. 3

Water quality parameters measured from influent and effluent in the study constructed wetland. a, NH4+-N. b, COD. c, TP.

3.2. Structure determination and model results

3.2.1. MLR modeling result

For MLR models, it is necessary to ensure that the variables are independent of each other and not affected by multicollinearity problems. Fortunately, the VIF values of the ten independent variables in the MLR model were all small, such as NH4+-N inf and PH being 1.11 and 1.07. The remaining VIF values were between 1.18 and 2.119, that is, all less than 5. This demonstrates that the correlation between independent variables was small and there was no multicollinearity problem. All the results are shown in Table 2. Therefore, we used the two subsets described in Section 3.2 to train and test the model, and calculated the regression coefficient of the model using regression analysis. The detailed results of MLR modeling are shown in Table 3.

Table 2.

Multicollinearity analysis results of independent variables in MLR model.

Input indicator Temp RH Rainfall Flow NH4+-N inf TPinf CODinf SSinf PHinf BOD5-inf
Independent X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Variable VIF 1.18 1.30 1.34 1.17 1.11 1.25 2.12 1.17 1.07 2.09
Table 3.

The MLR model equations.

Output indicator Response variable Model equation
NH4+-N (ma) YNH4+-N'eff (ma) −0.002x1+0.001x2−0.001x3−1.904 × 10−6x4+0.184x5+0.001x6−0.003x7+0.007x8+0.018x9+0.025x10−0.065
NH4+ –N YNH4+-N'eff −0.002x1+0.001x2−0.001x3−9.459 × 10−7x4+0.253x5+0.164 + x6-0.001x7+0.009x8+0.052x9+0.008x10−0.346
COD (ma) YCODeff (ma) −0.079x1+0.038x2+0.011x3+4.994 × 10−5x4−1.479x5+0.734x6+0.448x7−0.077x8+1.337x9−0.697x10−5.918
COD YCODeff 0.139x1+0.036x2+0.003x3+4.222 × 10−5x4−3.669x5+1.933x6+0.119x7+0.046x8+0.214x9+0.086x10+3.939
TP (ma) YTPeff (ma) 0.000314x1−0.001x2+0.00046x3+5.143 × 10−7x4+0.009x5+0.108 + x6+0.001x7−0.000359x8−0.002x9−0.001x10+0.112
TP YTPeff 0.000348x1−0.000482x2+0.001x3+6.324 × 10−7x4+0.025x5+0.274+x6−1.138 × 10−5x7+0.001x8−0.001x9+0.001x10+0.069

3.2.2. Neural network modeling results

The neural network models were used by the two backpropagation algorithms (BPNN and LSTM) during the entire training process. Additionally, we used a GA to optimize the weights and biases of the BPNN as the third network model. To the best of our knowledge, the structure of a network model is determined by the quantity of layers, total number of neurons in each layer, and characteristic of the transmission functions, and is a vital part of model development. Increasing the number of neurons could improve the accuracy of nonlinear fitting. However, an overly complex network would lead to overfitting and prolong the training time. Therefore, all applied models in this study had an input layer with ten neurons, corresponding to Temp, RH, Rainfall, Flow, NH4+-N inf, TPinf, CODinf, SSinf, PHinf, and BOD5-inf. The output layer was composed of six neurons, corresponding to effluent concentrations of NH4+-N eff, TPeff, CODeff, NH4+-Neff(ma), TPeff (ma), and CODeff (ma). Additionally, for the three models, we conducted experiments on one to four hidden layer structures, where we attempted to use 3–30 neurons in each hidden layer.

Considering the training efficiency and prediction accuracy, the resulting optimal topology of the hidden layers for the BPNN model was a three-layer structure, with 18 neurons in hidden layer 1, 14 neurons in hidden layer 2, and six neurons in hidden layer 3 (Fig. S1). Additionally, the best performing GA-BPNN had three hidden layers, with 16 neurons in layer 1, 11 neurons in layer 2, and 8 neurons in layer 3 (Figs. S2 and S3). The optimal structure of LSTM had three hidden layers, with 17 neurons in layer 1, 14 neurons in layer 2, and 12 neurons in layer 3 (Figs. S4 and S5).

3.3. Prediction performance on the raw testing set

A comparison of predicted versus measured data for three water quality indicators (CODeff, NH4+-N eff, and TPeff) is shown in Fig. 4. Different types of models had very different prediction results. The MLR predictions had a high degree of oscillation, and their R2 values were all less than 0.32 (as shown in Fig. 4). Even when NH4+-N eff was predicted (as shown in Fig. 4a), it was only 0.225, which means that the prediction of the effluent quality of CWs is not a simple linear problem. In comparison, the prediction results of the BPNN were much better, and its R2 was greater than 0.7; however, this is still far from satisfactory. In predicting CODeff (as shown in Fig. 4b), the BPNN underestimated the peak CODeff concentration, which resulted in a smooth line. The inconsistency of the BPNN suggests that it performed poorly compared with LSTM. However, when we added a GA to optimize the BPNN, although the GA-BPNN was unable to match the accuracy of LSTM, the GA-BPNN still achieved an R2 of 0.81, which was higher than that of a single BPNN. As shown in Fig. 4, the prediction effect of using the weights and bias generated by the GA to reduce the RMSE was much higher than that of the neural network generated by randomly generated weights and biases. LSTM outperformed the other models in the prediction of all metrics, particularly in the prediction of CODeff (as shown in Fig. 4b), where LSTM substantially outperformed the other models, with an R2 of 0.93. The reason for the satisfactory performance of LSTM may be that it can take into account the influence of past results on the present, which plays an important role in time series problems.

Fig. 4.

Fig. 4

Comparison of the three water quality indices predicted by the MLR model, the BPNN model, the GA-BPNN model, and the LSTM model with the measured results and their corresponding R2 values. ab, Scatter plot (a) and line plot (b) for NH4+-Neff. cd, Scatter plot (c) and line plot (d) for CODeff. ef, Scatter plot (e) and line plot (f) for TPeff.

3.4. Effect of the moving average on prediction performance

A comparison of predicted versus measured data for three water quality indicators after the moving average (CODeff(ma), NH4+-N eff(ma), and TPeff(ma)) is shown in Fig. 5. After we used the moving average method, the processed data were much smoother than the original data. We recreated new models using the processed data, and the accuracy of each model improved considerably. The improvement of the GA-BPNN when we used the moving average method was the most substantial among the four models, and the R2 of the three types of water quality indicators was close to 0.9, or even higher. By contrast, the accuracy of LSTM also improved; however, the increased amplitude was not as obvious as for the other models. Only in the prediction of NH4+-N eff(ma) did R2 achieve an increase of 0.013 compared with the original data (as shown in Fig. 4, Fig. 5a). We speculate that the application of the moving average method enabled the other three models, except LSTM, to consider the influence of past results so that high-frequency errors were eliminated, thereby improving accuracy.

Fig. 5.

Fig. 5

Comparison of the three water quality indices after the moving average predicted by the MLR model, the BPNN model, the GA-BPNN model, and the LSTM model with the measured results and their corresponding R2 values. ab, Scatter plot (a) and line plot (b) for NH4+-Neff(ma). cd, Scatter plot (c) and line plot (d) for CODeff(ma). ef, Scatter plot (e) and line plot (f) for TPeff(ma).

3.5. Comparison of the models

By comparing RMSE and R2 (as shown in Fig. 6), we can more intuitively identify the strength of the predictability of the four models. For the original dataset, based on the MLR model, the RMSE of the BP model decreased considerably, and R2 for the CODeff, TPeff, and NH4+-N eff prediction results increased by 49.1%, 47.2%, and 43.2%, respectively. This suggests that traditional machine learning can solve multiple regression problems better than linear methods because machine learning can fit more complex functions and achieve higher accuracy. However, because of the influence of the possible local minimum problem, the accuracy of the prediction results obtained by the BPNN only was still not satisfactory. After we optimized the BPNN using a GA, the RMSE of each model further decreased, and the R2 of the three predictors increased by 8.55%, 6.4%, and 7.31%, respectively. The reason for this is that we optimized the weights and biases of the network with the goal of reducing the RMSE of the prediction results. After we compared LSTM with the GA-BPNN, the RMSE of LSTM decreased more substantially, and R2 for each indicator increased by 9.9%, 10.49%, and 7.8% sequentially. This is because water quality data are complex time series data, and LSTM considers the effect of past results on the present, thereby achieving higher prediction accuracy.

Fig. 6.

Fig. 6

Accuracy evaluations for MLR, BPNN, GA-BPNN, and LSTM models. a, R2 comparison. b, RMSE comparison. c, RMSE comparison for NH4+-N and TP with more details.

Finally, after we processed the original data using the moving average method, the accuracy of the results of each model improved because some noise was removed. The improvement effect on the GA-BPNN was the most notable, and the increase in R2 reached above 8%, on average, whereas the R2 of LSTM was only 2%. We assume that this is because we averaged three days of data in the smoothing process, which transferred the previous influence into the other models; however, LSTM considered the influence of previous data, and thus achieved an insignificant improvement.

3.6. Future perspectives

In the future, we will attempt to develop a hybrid algorithm of RNNs to achieve higher accuracy or a faster model construction speed. Additionally, the prediction effect of the neural network had a high correlation with the amount of input data; however, an excessively high amount of data leads to a large consumption of human and material resources. Therefore, on the premise of not affecting the prediction effect of the model, we will also attempt to reduce the amount of data used. Additionally, we will further improve the forecast model of CWs to analyze GHG emissions. The timely prediction of carbon emissions or the absorption of CWs is important for helping the entire urban system to achieve carbon neutrality and further improve the intelligent management of urban water environments.

4. Conclusion

The deep learning network successfully predicted the next-day effluent quality of large-scale CWs and reveals the mapping relationship between the collected multi-source datasets and effluent quality. By comparing the prediction effects of the four models for three water effluent indicators, we obtained three main research conclusions: (1) Based on the original data with large fluctuations, the moving average method can be used to remove high-frequency noise in an actual large-scale application, and smoothed data can be obtained to improve the prediction effect. (2) Compared with MLR, backward feedback neural network, and neural network based on GA optimization, a deep learning neural network (LSTM) that can take into account previous training results achieves a better prediction effect on time series problems, such as water quality prediction. (3) A deep learning network can be quickly established to predict water quality in a real scenario by collecting a large number of simple and easy-to-obtain water quality indicators. The LSTM neural network can solve the disadvantage of time and money wasting to perform miniature experiments to obtain various parameters in the modeling of CWs. With the widespread application of CW sewage treatment methods, the prediction of CWs’ effluent quality not only plays a crucial role in the regulation of the urban water environment but also provides a feasible basis for solving urban non-point source pollution.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This investigation was funded by National Natural Science Foundation of China (No. 51908161 & 52100044), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515010807), State Key Laboratory of Urban Water Resource and Environment (Harbin Institute of Technology) (2021TS30) and Shenzhen Science and Technology Program (No. KQTD20190929172630447, KCXFZ20211020163404007 and GXWD20201230155427003-20200824100026001).

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ese.2022.100207.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.docx (886.3KB, docx)

References

  • 1.Han J.L., Yang Z.N., Wang H., Zhong H.Y., Xu D., Yu S., Gao L. Decomposition of pollutants from domestic sewage with the combination systems of hydrolytic acidification coupling with constructed wetland microbial fuel cell. J. Clean. Prod. 2021;319 doi: 10.1016/j.jcliepro.2021.128650. [DOI] [Google Scholar]
  • 2.Li D., Chu Z., Huang M., Zheng B. Multiphasic assessment of effects of design configuration on nutrient removal in storing multiple-pond constructed wetlands. Bioresour. Technol. 2019;290 doi: 10.1016/j.biortech.2019.121748. [DOI] [PubMed] [Google Scholar]
  • 3.Delre A., ten Hoeve M., Scheutz C. Site-specific carbon footprints of Scandinavian wastewater treatment plants, using the life cycle assessment approach. J. Clean. Prod. 2019;211:1001–1014. doi: 10.1016/j.jclepro.2018.11.200. [DOI] [Google Scholar]
  • 4.Shi H.-T., Feng X.-C., Xiao Z.-J., Wang W.-Q., Wang Y.-M., Zhang X., Xu Y.-J., Ren N.-Q. Analysis of the β-cyclodextrin enhancing bio-denitrification from the perspective of substrate metabolism, electron transfer, and iron acquisition. Chem. Eng. J. 2022;446 doi: 10.1016/j.cej.2022.137358. [DOI] [Google Scholar]
  • 5.Liang Y., Zhu H., Banuelos G., Yan B., Shutes B., Cheng X., Chen X. Removal of nutrients in saline wastewater using constructed wetlands: plant species, influent loads and salinity levels as influencing factors. Chemosphere. 2017;187:52–61. doi: 10.1016/j.chemosphere.2017.08.087. [DOI] [PubMed] [Google Scholar]
  • 6.Birch W.S., Drescher M., Pittman J., Rooney R.C. Trends and predictors of wetland conversion in urbanizing environments. J. Environ. Manag. 2022;310:114723. doi: 10.1016/j.jenvman.2022.114723. [DOI] [PubMed] [Google Scholar]
  • 7.Persson J., Wittgren H.B. How hydrological and hydraulic conditions affect performance of ponds. Ecol. Eng. 2003;21(4-5):259–269. doi: 10.1016/j.ecoleng.2003.12.004. [DOI] [Google Scholar]
  • 8.Su T.-M., Yang S.-C., Shih S.-S., Lee H.-Y. Optimal design for hydraulic efficiency performance of free-water-surface constructed wetlands. Ecol. Eng. 2009;35(8):1200–1207. doi: 10.1016/j.ecoleng.2009.03.024. [DOI] [Google Scholar]
  • 9.Wang X., Zhang F., Ding J., Kung H.-t., Latif A., Johnson V.C. Estimation of soil salt content (SSC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP neural network model and optimal spectral indices. Sci. Total Environ. 2018;615:918–930. doi: 10.1016/j.scitotenv.2017.10.025. [DOI] [PubMed] [Google Scholar]
  • 10.Wang H., Xu D., Han J., Xu R., Han D. Reshaped structure of microbial community within a subsurface flow constructed wetland response to the increased water temperature: improving low-temperature performance by coupling of water-source heat pump. Sci. Total Environ. 2021;781 doi: 10.1016/j.scitotenv.2021.146798. [DOI] [Google Scholar]
  • 11.Zhang J., Sun H., Wang W., Hu Z., Yin X., Huu Hao N., Guo W., Fan J. Enhancement of surface flow constructed wetlands performance at low temperature through seasonal plant collocation. Bioresour. Technol. 2017;224:222–228. doi: 10.1016/j.biortech.2016.11.006. [DOI] [PubMed] [Google Scholar]
  • 12.Ahmed A.N., Othman F.B., Afan H.A., Ibrahim R.K., Fai C.M., Hossain M.S., Ehteram M., Elshafie A. Machine learning methods for better water quality prediction. J. Hydrol. 2019;578 doi: 10.1016/j.jhydrol.2019.124084. [DOI] [Google Scholar]
  • 13.Hameed M., Sharqi S.S., Yaseen Z.M., Afan H.A., Hussain A., Elshafie A. Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia. Neural Comput. Appl. 2017;28:S893–S905. doi: 10.1007/s00521-016-2404-7. [DOI] [Google Scholar]
  • 14.Samso R., Garcia J., Molle P., Forquet N. Modelling bioclogging in variably saturated porous media and the interactions between surface/subsurface flows: application to Constructed Wetlands. J. Environ. Manag. 2016;165:271–279. doi: 10.1016/j.jenvman.2015.09.045. [DOI] [PubMed] [Google Scholar]
  • 15.Chang N.-B., Mohiuddin G., Crawford A.J., Bai K., Jin K.-R. Diagnosis of the artificial intelligence-based predictions of flow regime in a constructed wetland for stormwater pollution control. Ecol. Inf. 2015;28:42–60. doi: 10.1016/j.ecoinf.2015.05.001. [DOI] [Google Scholar]
  • 16.Granata F., Gargano R., de Marinis G. Artificial intelligence based approaches to evaluate actual evapotranspiration in wetlands. Sci. Total Environ. 2020;703 doi: 10.1016/j.scitotenv.2019.135653. [DOI] [PubMed] [Google Scholar]
  • 17.Hosseinzadeh A., Baziar M., Alidadi H., Zhou J.L., Altaee A., Najafpoor A.A., Jafarpour S. Application of artificial neural network and multiple linear regression in modeling nutrient recovery in vermicompost under different conditions. Bioresour. Technol. 2020;303 doi: 10.1016/j.biortech.2020.122926. [DOI] [PubMed] [Google Scholar]
  • 18.Lau B.P.L., Marakkalage S.H., Zhou Y., Ul Hassan N., Yuen C., Zhang M., Tan U.X. A survey of data fusion in smart city applications. Inf. Fusion. 2019;52:357–374. doi: 10.1016/j.inffus.2019.05.004. [DOI] [Google Scholar]
  • 19.Niu G., Li X., Wan X., He X., Zhao Y., Yi X., Chen C., Xujun L., Ying G., Huang M. Dynamic optimization of wastewater treatment process based on novel multi-objective ant lion optimization and deep learning algorithm. J. Clean. Prod. 2022;345 doi: 10.1016/j.jclepro.2022.131140. [DOI] [Google Scholar]
  • 20.Song K., Park Y.-S., Zheng F., Kang H. The application of Artificial Neural Network (ANN) model to the simulation of denitrification rates in mesocosm-scale wetlands. Ecol. Inf. 2013;16:10–16. doi: 10.1016/j.ecoinf.2013.04.002. [DOI] [Google Scholar]
  • 21.Niu G., Yi X., Chen C., Li X., Han D., Yan B., Huang M., Ying G. A novel effluent quality predicting model based on genetic-deep belief network algorithm for cleaner production in a full-scale paper-making wastewater treatment. J. Clean. Prod. 2020;265 doi: 10.1016/j.jclepro.2020.121787. [DOI] [Google Scholar]
  • 22.Boutaba R., Salahuddin M.A., Limam N., Ayoubi S., Shahriar N., Estrada-Solano F., Caicedo O.M. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Journal of Internet Services and Applications. 2018;9 doi: 10.1186/s13174-018-0087-2. [DOI] [Google Scholar]
  • 23.Yu Z., Yang K., Luo Y., Shang C. Spatial-temporal process simulation and prediction of chlorophyll-a concentration in Dianchi Lake based on wavelet analysis and long-short term memory network. J. Hydrol. 2020;582 doi: 10.1016/j.jhydrol.2019.124488. [DOI] [Google Scholar]
  • 24.Zhang Y., Li C., Jiang Y., Sun L., Zhao R., Yan K., Wang W. Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J. Clean. Prod. 2022 doi: 10.1016/j.jclepro.2022.131724. [DOI] [Google Scholar]
  • 25.Akratos C.S., Papaspyros J.N.E., Tsihrintzis V.A. An artificial neural network model and design equations for BOD and COD removal prediction in horizontal subsurface flow constructed wetlands. Chem. Eng. J. 2008;143(1-3):96–110. doi: 10.1016/j.cej.2007.12.029. [DOI] [PubMed] [Google Scholar]
  • 26.Akratos C.S., Papaspyros J.N.E., Tsihrintzis V.A. Artificial neural network use in ortho-phosphate and total phosphorus removal prediction in horizontal subsurface flow constructed wetlands. Biosyst. Eng. 2009;102(2):190–201. doi: 10.1016/j.biosystemseng.2008.10.010. [DOI] [PubMed] [Google Scholar]
  • 27.Akratos C.S., Papaspyros J.N.E., Tsihrintzis V.A. Total nitrogen and ammonia removal prediction in horizontal subsurface flow constructed wetlands: use of artificial neural networks and development of a design equation. Bioresour. Technol. 2009;100(2):586–596. doi: 10.1016/j.biortech.2008.06.071. [DOI] [PubMed] [Google Scholar]
  • 28.Antwi P., Li J.Z., Meng J., Deng K.W., Quashie F.K., Li J.L., Boadi P.O. Feedforward neural network model estimating pollutant removal process within mesophilic upflow anaerobic sludge blanket bioreactor treating industrial starch processing wastewater. Bioresour. Technol. 2018;257:102–112. doi: 10.1016/j.biortech.2018.02.071. [DOI] [PubMed] [Google Scholar]
  • 29.Kiiza C., Pan S.-q., Bockelmann-Evans B., Babatunde A. Predicting pollutant removal in constructed wetlands using artificial neural networks (ANNs) Water Sci. Eng. 2020;13(1):14–23. doi: 10.1016/j.wse.2020.03.005. [DOI] [Google Scholar]
  • 30.Hwangbo S., Al R., Chen X., Sin G. Integrated model for understanding N2O emissions from wastewater treatment plants: a deep learning approach. Environ. Sci. Technol. 2021;55(3):2143–2151. doi: 10.1021/acs.est.0c05231. [DOI] [PubMed] [Google Scholar]
  • 31.Vilsen S.B., Stroe D.-I. Battery state-of-health modelling by multiple linear regression. J. Clean. Prod. 2021;290 doi: 10.1016/j.jclepro.2020.125700. [DOI] [Google Scholar]
  • 32.Herrig I.M., Boeer S.I., Brennholt N., Manz W. Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany. Water Res. 2015;85:148–157. doi: 10.1016/j.watres.2015.08.006. [DOI] [PubMed] [Google Scholar]
  • 33.Gebler D., Wiegleb G., Szoszkiewicz K. Integrating river hydromorphology and water quality into ecological status modelling by artificial neural networks. Water Res. 2018;139:395–405. doi: 10.1016/j.watres.2018.04.016. [DOI] [PubMed] [Google Scholar]
  • 34.Liu J., Wang Z., Xu M. DeepMTT: a deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network. Inf. Fusion. 2020;53:289–304. doi: 10.1016/j.inffus.2019.06.012. [DOI] [Google Scholar]
  • 35.Niu D., Wu F., Dai S., He S., Wu B. Detection of long-term effect in forecasting municipal solid waste using a long short-term memory neural network. J. Clean. Prod. 2021;290 doi: 10.1016/j.jclepro.2020.125187. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (886.3KB, docx)

Articles from Environmental Science and Ecotechnology are provided here courtesy of Elsevier

RESOURCES