Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Dec 8;30(12):33504–33515. doi: 10.1007/s11356-022-24604-2

Water consumption prediction and influencing factor analysis based on PCA-BP neural network in karst regions: a case study of Guizhou Province

Zhicheng Yang 1, Bo Li 1,2,, Huang Wu 1, MengHua Li 1, Juan Fan 3, Mengyu Chen 1, Jie Long 4
PMCID: PMC9734345  PMID: 36480138

Abstract

Water consumption prediction is an integral part of water resource planning and management. Constructing a highly precise water consumption prediction model is of great significance for promoting regional water resource planning and high-quality development of the socio-economy. This paper focuses on the case of the typical karst region in Guizhou Province in China. Based on data on water consumption and its influencing factors spanning 2000–2020, the principal component analysis method was applied to reduce the dimensionality of 16 influencing factors of water consumption in Guizhou; the principal components extracted were used as input samples of the BP neural network and a PCA-BP neural network water consumption prediction model was conducted to predict water consumption of Guizhou Province in the next 10 years. The results show that the mean absolute error and mean relative error of prediction based on the constructed PCA-BP neural network were 2.8% and 2.9%, respectively, with superior performance in terms of prediction error and trends compared with other models. This paper discusses the main influencing factors of water consumption and analyzes their influence on the water consumption forecasting model so that the parameters of the water consumption forecasting model can be selected more efficiently and provide a reference for regional water consumption analysis and water resource planning and management.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11356-022-24604-2.

Keywords: Karst region, Water consumption prediction, Principal component analysis, BP neural network prediction, Influencing factor

Introduction

Water is an indispensable resource for humans. With economic development and population growth around the world, the contradiction of supply and demand for water resources has become aggravated, and under the background of the current COVID-19 pandemic, the pattern of global carbon emissions has changed. The rational use of water resources is an important part of carbon emissions so making water resource planning is increasingly crucial (Wang et al. 2018a, b; Liu et al. 2021; Wang and Su 2020; Li et al. 2022). Precisely predicting water consumption is the first and foremost task for water resource planning and management and the prediction results directly influence the reliability and practicality of water resource planning and decision-making (Hao et al. 2009; Sivapalan et al. 2012). In the meantime, water consumption is subject to the influences of many uncertain factors such as the total amount of water resources, climate, population, and economy, significantly increasing the difficulty of accurately predicting water consumption (Fan et al. 2017).

Many contributions have been made to water consumption prediction by previous generations. They have proposed a series of water consumption prediction models, including the autoregressive-moving average model (ARIMA), support vector regression model (SVR), gray theory model GM (1, 1), random forest regression model, and neural network model (He, Fang et al.). The influencing factors of high degree are selected to establish an improved coupling model of the grey system and multiple regressions to predict water consumption in Wuhan. The applied research showed that the forecast effect of the improved coupled model is good with a relative error of less than 1%, and successfully predicted the water consumption data of Wuhan City in 2015 (He and Tao 2014). Q, Wang et al. combined the projection pursuit algorithm and the real-coded accelerated genetic algorithm to establish a comprehensive model, which used 17 high-dimensional, non-normal, and nonlinear complex index data to evaluate renewable energy. Sustainability achieved good results (Wang and Su 2020). Dos Santos, DC et al. used an artificial neural network (ANN) system approach to predict water consumption in the metropolitan area of São Paulo with low prediction error, considering the influence of weather and environmental factors on water consumption (Dos Santos and Pereira Filho 2014). Farias, RL et al. used the Qualitative Multi-Model Predictor Plus (QMMP +) model to predict water use in Barcelona and compared it with the Radial Basis Function Artificial Neural Networks (RBF-ANN), the statistical Autoregressive Integrated Moving Average (ARIMA), and Double Seasonal Holt-Winters (DSHW) models, which have higher prediction accuracy (Lopez Farias et al. 2018). Pu et al. proposed a variable structure support vector regression (VS-SVR) water use prediction model, and the results showed that the VS-SVR model prediction reduced the error of the prediction results by 1.2% compared to the SVR model (Pu et al. 2015). Q, Wang et al. developed two combined ARIMA-BPNN and BPNN-ARIMA simulation methods to simulate carbon emissions in China, India, the USA, and the European Union under the COVID-19 no-pandemic scenario. The average relative error of the simulation is about 1% (wang et al.2021). Almanjjahie et al. proposed a multiplicative seasonal autoregressive integrated moving average model (SARIMA), which adequately takes into account the seasonal characteristics of water consumption and obtains better prediction results (Almanjahie et al. 2019). Sebri, M et al. quarterly time series of household water consumption in Tunisia is forecasted using a comparative analysis between the traditional Box-Jenkins method and artificial neural networks approach. Results indicate that the traditional Box-Jenkins method has higher prediction accuracy than the neural network model and is closer to the actual data (Sebri 2013). Q, Wang et al. developed grey theory-based single-linear, hybrid-linear, and non-linear forecasting techniques based on grey theory are developed to more accurately forecast energy demand in China and India (wang et al.2018a, b). Piasecki, A et al. prediction of water use on the Czerniewice estate using a multilayer perception (MLP) as artificial neural network approach in combination with nine factors, including meteorology, with good prediction results (Piasecki et al. 2016). Chen et al. a multiple random forests model, integrated wavelet transform and random forests regression (W-RFR), proposed for the prediction of daily urban water consumption in the southwest of China and the results showed that the W-RFR model can not only meet the prediction accuracy of water consumption but also has higher prediction accuracy than the RFR model and FFNN model (Chen et al. 2017).

Despite certain achievements made in research based on the above water consumption prediction methods, their applications are not without limitations. For example, the autoregressive integrated moving average (ARIMA) model puts more consideration into changing patterns in the yearly water consumption data without giving the effect of water consumption influencing factors on the series into account (Wu et al. 2021). Although BP neural network considers the effect of other factors on water consumption, the method tends to produce local optimal solutions (Xu et al. 2020). Existing research on water consumption tends to focus on prediction models themselves without considering the combined effect of influencing factors on water consumption. In addition, given the large number of water consumption influencing factors, introducing too many influencing factors would complicate the model, making it difficult to ensure prediction accuracy (Azimi et al. 2018; Lili et al. 2021). On the other hand, water consumption influencing factors entail complicated relationships and overlapping information; to eliminate the correlation of complex data, it is necessary to simply influence factor data (Wu et al. 2021). This paper proposes a principal component analysis method to reduce the dimensionality of water consumption influencing factors and extract highly correlated principal components to replace extremely complicated influencing factors. Using data of the extracted principal components as inputs of the BP neural network effectively alleviates the model’s tendency to produce local optimal solutions, thereby increasing the precision of the prediction model. As such, based on the PCA-BP neural network model, this paper employs the data of water consumption and its influencing factors spanning 2000–2020 in the karst region of Guizhou, China to predict water consumption and analyze the effect of influencing factors on the prediction model. The proposed method is conducive to alleviating regional contradictions of water supply and demand and improving water resource planning and management.

Methodology and data

Principal component analysis (PCA)

Initially proposed by K. Pearson in 1901 and improved by Hotelling in 1933, principal component analysis is a data dimensionality reduction method that transforms multiple indicators into a few composite indicators (Castura et al. 2022; Heo et al. 2009). Given a certain level of information overlap in raw data, PCA conducts linear spatial projection on the raw dataset based on analyzing matrix characteristics, then converse raw data into a new-characteristic space to extract the principal linear component; a few principal component variables that are most likely to represent the information contained by the original variables are used to replace the original variables, thereby reducing the dimensionality of a multi-variable system and transform it into a system containing individual and non-correlated variables (Wu et al. 2021; Liu et al. 2003). The algorithmic steps are as follows:

  • Step 1: Conduct standardization processing over the raw data matrix to obtain a new data matrix;

  • Step 2: After the standardization, construct the coefficient matrix of variables R;

  • Step 3: Compute the characteristic values of the correlation matrix R and order them in a sequential manner; find the characteristic vector corresponding to each characteristic value;

  • Step 4: Find the contribution rate Pm and cumulative contribution rate αm. The following principle is usually followed for the selection of principal components: the characteristic value is larger than 1 and the cumulative contribution rate is about 85%;

  • Step 5: Calculate the loading of the principal component, denoted by Zm, that is, the correlation coefficient between principal components and variables;

  • Step 6: Obtain principal components data after dimensionality reduction based on characteristic values and their corresponding characteristic vectors.

BP neural network

Neural networks construct data processing models by imitating the structure of brain neurons and their reflection process (Xu et al. 2018). As one type of the neural network, BP neural network is a multi-layer feedforward neural network composed of the input layer, hidden layer, an output layer, input vector, hidden weight value, threshold, the activation function of the hidden layer, and the activation function of the output layer and output function. Neurons of a layer only receive neural signals from the previous layer and those at the same layer do not have any connections with each other. The number of hidden layers, the number of neurons in each layer, and the network learning rate can be adjusted or set based on specific needs (Jia and Wu 2020; Zhu et al. 2021), as shown in Fig. 1. In the meantime, the processes of forward propagation network error and back propagation errors are repeatedly cycled based on the actual and expected errors of the BP neural network to constantly adjust the weight and threshold values until the error between the network output value and the expected output value of samples is reduced to an acceptable level and the preset number of cycles is reached so that the model is as accurately fitted as possible (Wu et al. 2021).

Fig. 1.

Fig. 1

The working principle of the BP neural network

Combined prediction model of PCA-BP neural network

Using the principle of error correction, the PCA-BP neural network combination method is formed by combining the principal component analysis method and the bp neural network model. Combination model combines the advantages of the principal component analysis method and the BP neural network and makes up for the shortcomings. When the BP neural network model predicts, the simplicity and complexity of the input variables are one of the keys to affect the prediction effect. However, there are many factors affecting water consumption, and all the inputs will inevitably affect the prediction accuracy, so the number of input variables needs to be reduced. In order to ensure that the original information is preserved as much as possible while reducing the number of input variables. The principal component analysis method is used to reduce the dimension of the influencing factors of water consumption, and the formed new principal component data is input into the BP neural network model to obtain the final prediction data. Specific steps are as follows:

  • Step 1: Conduct a principal component analysis on the influencing factors of water consumption.

  • Step 2: Select the newly formed principal components to make their cumulative variance contribution rate greater than 85%.

  • Step 3: Input the principal components selected in step 2 into the bp neural network model to predict water consumption.

Model accuracy test

To test model errors, this paper adopts mean absolute error (MAE) and mean relative error (MRE) to analyze the prediction results of different models. The MAE refers to the mean of absolute errors between prediction values and real values. While the MRE refers to the mean of the quotients between the prediction values and real values. Both indicators can effectively reflect model accuracy.

MAE=1mi=1mx-y 1
MRE=-1mi=1mx-yy×100% 2

Overview of the researched region

Located in the inland region of southwestern China, Guizhou is one of the three contiguous karst regions in the world and the central one in East Asia covering an area of about 176,200 km2. Located in the watershed region of the Yangtze and Zhujiang river basins, Guizhou province harbors densely distributed river networks and has an average annual precipitation of about 1200 mm. The main supply source of water resources is atmospheric precipitation and surface and groundwater mutually compensate for each other in frequent hydrologic cycles. Despite abundant water resources due to its coverage of the Yangtze and Zhujiang river systems, the region suffers from ground karst development and special binary structures between surface and groundwater cause severe permeation of groundwater, making it difficult to form contiguous catchment areas. With extremely poor water holding capacity, part of the rivers may dry up during seasonal droughts. Furthermore, deeply buried groundwater has diverse occurrence modes, making it difficult to exploit. For such a typical karst region like Guizhou, accurately predicting water consumption helps the government formulate well-informed water resource management policies, addressing the contradiction between water demand and supply and promoting sustainable development of regional water resources see Fig. 2.

Fig. 2.

Fig. 2

Geographical location of Guizhou Province

Data source

This research adopts statistical data derived from the Guizhou Water Resources Bulletin (Guizhou Provincial Department of Water Resources 2020), Statistical Yearbook of Guizhou Province (Guizhou Provincial Bureau of Statistics 2020), and Statistical Bulletin of National Economic and Social Development of Guizhou Province (Guizhou Provincial Bureau of Statistics 2020). Seventeen indicators including water consumption, precipitation, the total amount of water resources, population, GDP, and total industrial output were selected. See Table S1 for more details see Fig. 3.

Fig. 3.

Fig. 3

Water consumption influence factors

Analysis of influencing factors

Water consumption influencing factors are an organic collection of multiple indicators. Based on precedent studies in combination with the actual conditions of Guizhou, 16 influencing factors for water consumption in Guizhou were selected from aspects of agricultural, industrial, domestic, and eco-environmental water consumption (Guizhou Provincial Water Resources Bulletin 2020; Chen et al. 2012; Sandiford et al. 1990; Duan and Chen 2020; Keshavarzi et al. 2006). Specifically, agricultural water consumption is related to effective irrigation area, total agricultural output value, value-added of farming, forestry, animal husbandry, and fishery, and total grain production; industrial water consumption is closely related to total industrial output value and value-added of industry; domestic water consumption is subject to influences of population, GDP, and water supply penetration rate; eco-environmental water consumption is reflected by precipitation, the total amount of water resources, groundwater resources, temperature, annual sunshine hours, forestry area, and total wastewater discharge (Romano et al. 2016; Guizhou 2020; Fan et al. 2017).

Results and analysis

Data standardization

Influencing factors for water consumption in Guizhou are numerous and entail different dimensions. To reduce the influences of different dimensions on the final prediction results, the standardization procedure was performed on data first. The equation is as follows: where Q denotes the standardized variable; qxy denotes the y influencing factors corresponding to the x-th sample; qy denotes the sample mean of the y-th influencing factor; and sy denotes the standard deviation of the y-th influencing factor.

Q=qxy-qySy,x=1,2,m;y=1,2,n 3

In the meantime, this paper adopts the following variables to replace influencing factors: X1-X16 are precipitation, the total amount of water resources, total grain production, total industrial output value, population, GDP, water supply penetration rate, annual sunshine hours, groundwater resources, added-value of industry, added-value of farming, forestry, animal husbandry and fishery, temperature, total wastewater discharge, forestry area, effective irrigation area, and total agricultural output value. The results are shown in Table S2.

Principal component analysis

Given the different influencing degrees of each variable on the predicted target, it is difficult to obtain ideal results if all 16 influencing factors are entered into the prediction model as characteristics. Therefore, this research adopts the PCA method to perform dimensionality reduction and characteristic selection on data. Based on standardized data, the contribution rates and cumulative contribution rates of the covariance matrix, characteristic value of matrix, eigenvector, and principal components were calculated.

As shown in Table 1, the KMO statistic is 0.695 and the Bartlett’s test value is 0, indicating that each water consumption influencing factor is separately independent of others and the PCA method can be used to reduce dimensionality.

Table 1.

The KMO and the Bartlett’s test

The KMO statistic The Bartlett’s test
The approximate chi-square Degree of freedom Significance
0.695 661.922 120 0.000

As can be known from Table 2, when 4 principal components are extracted, the eigenvalue is 1.015 (≥ 1) and the cumulative variance contribution rate is 90.62%, indicating that these components contain the majority of information on water consumption influencing factors. Therefore, 4 principal components were selected to replace water consumption influencing factors.

Table 2.

Principal component eigenvalue and variance contribution rate

Serial number Initial eigenvalue
Eigenvalue Variance contribution rate (%) Cumulative variance contribution rate (%)
1 8.882 55.512 55.512
2 3.217 20.105 75.618
3 1.384 8.651 84.269
4 1.015 6.346 90.615
5 0.790 4.935 95.549
6 0.321 2.004 97.553
7 0.189 1.180 98.734
8 0.096 0.599 99.333
9 0.046 0.290 99.622
10 0.028 0.172 99.795
11 0.016 0.100 99.895
12 0.009 0.054 99.949
13 0.006 0.035 99.984
14 0.002 0.015 99.999
15 0.000 0.001 100.000
16 0.000 0.000 100.000

The scoring coefficient of the principal components reflects the degree of correlation between principal components and water consumption influencing factors. As shown in Table 3, for the 1st principal component (F1), the loading contributions of total agricultural output value, effective irrigation area, forestry area, total wastewater discharge, GDP, value-added of industry, and water supply penetration rate are relatively large, and thus principal component 1 can be generalized as the socio-economic development and intra-provincial eco-environmental factor; the 2nd principal component (F2) mainly reflects conditions of water resources in Guizhou province, which is significantly positively correlated with principal components, total amount of water resources and groundwater resources but significantly negatively correlated with annual sunshine hours and effective irrigation area, which is because sunshine and irrigation cause water consumption, leading to a reduction in total amount of water resources; principal component 3 (F3) has a relatively large loading on total grain production and thus mainly represents the agricultural influencing factor; principal component 4 (F4) has a large loading on annual sunshine hours and can be generalized as the weather factor. Principal components 1 through 4 reflect the comprehensive conditions of water consumption in Guizhou from different perspectives. These 4 factors can be used as major influencing factors for water consumption in Guizhou for further prediction, and based on the eigenvectors corresponding to eigenvalues, the principal component data of F1, F2, F3, and F4 can be obtained, as shown in Table S3.

Table 3.

Component score coefficient

Influencing factor Component
1 2 3 4
Precipitation 0.508 0.826 0.106 0.044
The total amount of water resources 0.440 0.851 0.161  − 0.012
Total grain production 0.502  − 0.250 0.606  − 0.273
Total industrial output value 0.961  − 0.157 0.032  − 0.032
Population  − 0.352 0.623  − 0.373 0.393
GDP 0.981  − 0.073  − 0.113 0.117
Water supply penetration rate 0.898  − 0.345 0.113  − 0.167
Annual sunshine hours  − 0.033  − 0.536 0.200 0.736
Groundwater resources 0.339 0.857 0.102  − 0.134
Added-value of industry 0.987  − 0.121  − 0.067 0.064
Added-value of farming 0.725 0.155  − 0.307 0.008
Temperature 0.070 0.318 0.785 0.335
Total wastewater discharge 0.97  − 0.047  − 0.134 0.027
Forestry area 0.96 0.023  − 0.152 0.198
Effective irrigation area 0.961  − 0.221  − 0.006  − 0.062
Total agricultural output value 0.979  − 0.007  − 0.094 0.148

BP neural network

The F1, F2, F3, and F4 data obtained from the PCA were used as input layer data, with the number of input nodes set as 4, the target error as 10−5, and the maximum training times as 1000. Using the annual water consumption of Guizhou as a prediction object, the water consumption data of Guizhou spanning 2000–2020 were divided as the training set and test set, respectively. Based on the 7:3 division principle, the data spanning 2000–2013 was divided as the training set while data spanning 2014–2020 was divided as the test set. The BP neural network prediction model was constructed to test the prediction performance of the PCA-BP neural network model.

As can be found from the fitting results shown in Figs. 4 and 5, after using the PCA to reduce the dimensionality of water consumption influencing factors, using the BP neural network to predict water consumption in Guizhou resulted in a fairly good fitting performance, with the fitting values of training and prediction are 0.99084 and 0.96045, respectively. In terms of the selection of the number of neurons at the hidden layer, an ideal fitting performance can be obtained if the number is set around 2 k + 1 if the number of input nodes is k. According to the test conducted in this research, the optimal effect can be obtained when the number of hidden layers is 7. The prediction results are shown in Table 4.

Fig. 4.

Fig. 4

Training results fitting

Fig. 5.

Fig. 5

Prediction results fitting

Table 4.

Prediction results of different models

Year Real water consumption (108m3) PCA-BP forecast (108m3) BP forecast (108m3) ARIMA forecast (108m3) GM(1,1) with fractional order accumulation (108m3)
2014 95.31 93.26 99.24 93.4 97.6071
2015 97.49 96.53 100.82 97.2 97.6102
2016 100.31 98.46 104.51 97.7 97.612
2017 103.51 97.59 104.7 99.6 97.613
2018 106.79 106.13 104.75 101.6 97.6136
2019 108.06 106.97 95.94 103.4 97.6139
2020 90.08 97.5 98.53 102.9 97.6142
MAE 2.8 5.03 4.48 5.45
MRE 2.90% 5.10% 4.60% 5.36%

In the meantime, to demonstrate that the PCA-BP neural network has a higher prediction accuracy, a comparison with the BP neural network, the GM(1,1) with fractional order accumulation, and the ARIMA model was performed. Table 4 shows the results of 7-year water consumption prediction and errors in Guizhou based on different models.

According to Table 4 and Fig. 6, the overall trend of the prediction results based on the PCA-BP neural network is more skewed to real water consumption, with the MAE being 2.8 and MRE being 2.9%, lower than those of other prediction models. Notably, despite a better error performance of the BP neural network when influencing factors are directly input, it derives a poorer prediction trend. For example, the real water consumption was a gradually increasing trend in 2014–2018 followed by a sudden dropping trend in 2019–2020; while the trends derived from the BP neural network were gradual growth in 2014–2018, followed by a sudden decrease in 2019 and then a rise in 2020. This indicates a poorer trend of prediction compared with that derived by entering dimensionality-reduced factors generated by the PCA method. This is because the massive information of influencing factors and numerous input factors of the BP neural network complicated the model and affected to some extent its prediction trend. The comparison sufficiently demonstrates that entering too many influencing factors affects prediction results, thus pointing to the importance of reducing the dimensionality of water consumption influencing factors.

Fig. 6.

Fig. 6

Prediction results of different models

The above PCA-BP neural network model can reasonably predict future water consumption in Guizhou. Based on the average growth rates and mean values of water consumption influencing factors in 2000–2020 in Guizhou, as shown in Table S3, the influencing factor data in 2021–2030 were predicted. Furthermore, water consumption data of Guizhou in 2021–2030 were predicted using the PCA-BP neural network model, with the results shown in Table 5.

Table 5.

Water consumption forecast of Guizhou in 2021–2030

Year Predicted value (108m3) Year Predicted value (108m3)
2021 97.48 2026 101.19
2022 98.58 2027 100.96
2023 99.59 2028 100.31
2024 100.43 2029 99.33
2025 100.99 2030 98.02

As can be seen from prediction data in Table 5 and Fig. 7, water consumption in Guizhou in the next 10 years first shows a period of increase followed by a decrease, exhibiting a “heap”-shape trend. Such a first-increasing then-decreasing trend is a result of combined effects like socioeconomic advancement, development, and more attention to eco-environmental construction in recent years. As can be seen from Table S4, the 16 influencing factors of water consumption in Guizhou all showed positive average growth in the past 20 years; thus, it can be deduced that over the short term water consumption in the province will continue to grow. In addition, during the 13th five-year plan period, Guizhou province has put in vigorous effects to develop the Guizhou Jiayan Water Diversion Project and Guizhou Huangjiawan Key Water Resources Project to promote the scientific development of water resources, comprehensively increase the level of water security. This will inevitably cause continuous growth in industrial and eco-environmental water consumption, thereby resulting in an increasing trend in water consumption in Guizhou. Despite the positive average growth rates of all influencing factors, the average growth of population, total grain production, water supply penetration rate, and temperature are relatively small at 0.15%, 2.22%, 2.22%, and 0.08%, respectively. The guiding policies of Guizhou Province, for example, the 14th five-year plan, point out that the province will gain a foothold in resource environmental bearing capacity to deploy ecologically functional spaces in an orderly fashion. With the industrial restructuring and increasingly sophisticated water conservation facilities in the province, the declining trend of domestic and eco-environmental water consumption will be inevitable, further driving the declining trend of water consumption in Guizhou over the long term in the future.

Fig. 7.

Fig. 7

Water consumption forecast of Guizhou in 2021–2030

Discussion

The overall water consumption in Guizhou province has projected a growing trend in recent years. However, factors like precipitation and the total amount of water resources exhibited a smooth trend. This has led to a strain on the supply–demand relationship and thus, accurate prediction of water consumption is urgently needed in order to provide a reasonable basis for future water resource planning. As water consumption is subject to the effect of numerous factors like precipitation, sorting out the interactive relationship between water consumption and its influencing factors is the key to accurate prediction. The effects of water consumption influencing and input factors on the prediction model are discussed as follows:

  • To determine the effect of influencing factors on water consumption, the sums of the absolute values of the F1-4 scores corresponding to each influencing factor shown in the principal component scoring coefficient table (Table 3) were calculated based on the PCA principle, and then the influencing factors were sequenced based on the magnitude of the value. See Table S5 for more details. The value reflects the magnitude of the correlation between an influencing factor and water consumption; the larger the value, the higher the degree of correlation. As can be seen from Table S5, factors having a relatively higher correlation with water consumption are population, total grain production, water supply penetration rate, and temperature.

Compared with previous detailed studies on water consumption by population and water price, the influence indicators selected in this paper and the main influencing factors of water consumption are undoubtedly more comprehensive (Nosvelli et al. 2009).

It can also be found from Fig. 8 that water consumption has an almost the same trend with population, total grain production, water supply penetration rate, and temperature, that is, increasing first and then decreasing over time. All of them reach minimum values around 2011 and then gradually grow. This further validates that the above influencing factors are significantly correlated with water consumption in Guizhou.

  • As shown in Fig. 6, the 2019 prediction and real water consumption derived from the single BP neural network prediction model are hugely different from those of the PCA-BP neural network. A comparison of the prediction processes of the BP neural network model and the PCA-BP neural network reveals that the entry of influencing factors is the main cause of such differences.

Fig. 8.

Fig. 8

Trends in influencing factors

Different from other articles, this article will select the indicators of the input bp model and reduce the input indicators while ensuring the completeness of the information as much as possible (Chen et al.2020).

In the 2018–2019 data of all influencing factors, the values of total agricultural output value, value-added of industry, annual sunshine hours, temperature, total grain production, and effective irrigation area all exhibited different levels of decrease. The decrease in values of these 6 influencing factors affected the 2019 prediction results of the BP neural network model. However, based on the analysis presented earlier, these 6 influencing factors largely have a low correlation with water consumption in Guizhou, but they nevertheless affected the overall prediction trend and reduced the accuracy of the model. Here, the indicators with low correlation with water consumption are successfully eliminated, which also makes the impact indicators more suitable for the characteristics of karst areas, which points to the importance of performing dimensionality reduction on influencing factors for model accuracy.

The PCA-BP neural network model also has some drawbacks in predicting water consumption. First, due to various restrictions, only the 2000–2020 water consumption data and those of the 16 influencing factors were collected. On the one hand, the data samples are relatively few; on the other hand, the data may not cover all the influencing factors of water consumption. Second, there is still room for improving model accuracy. However, the model proposed in this paper is still capable of providing technical support for water resources management in Guizhou Province. In the future, the comprehensibility of the data should be improved and water consumption influencing factors be analyzed based on socioeconomic principles.

Conclusion

This paper proposes a PCA-BP neural network prediction method to improve the prediction accuracy while incorporating the indicators that affect water consumption into the prediction model. Water consumption prediction is affected by numerous influencing factors that have complicated relationships. BP neural network has a fairly good prediction performance when multiple factors are involved but are prone to produce local convergence and high randomness, leading to a necessity to reduce the number of input nodes. Thus, this paper proposes a PCA-BP neural network model to predict water consumption in Guizhou Province.

  • With a sufficient consideration of the actual conditions of Guizhou, 16 major water consumption influencing factors were selected and the PCA method was introduced to reduce their dimensionality, deriving 4 principal components with a cumulative variance contribution rate of 90.62%.

  • Principal components and water consumption data were entered into the BP neural network to validate the water consumption in 2014–2020; a comparison with the prediction results of grey GM(1,1), time series ARIMA and single BP neural network was performed and the results indicate that the MAE and MRE of the prediction based on the PCA-BP model are 2.8 and 2.9%, respectively, lower than those of other prediction models. In particular, compared with the single BP neural network, PCA effectively reduced data redundancy and improved prediction accuracy.

  • On that basis, the PCA-BP neural network model was used to predict water consumption in 2021–2030 in Guizhou. Water consumption in Guizhou Province in 2021–2030 exhibits a “heap”-shape trend that first increases then decreases.

This paper uses the PCA-BP model to provide a new idea for water consumption prediction, which has certain reference significance for promoting the planning and management of water resources in karst areas. However, the selection of water consumption impact indicators in this paper is affected by the characteristics of water resource consumption in karst areas. Therefore, if this model needs to be used in the planning and analysis of water consumption and water resources in a wider area, the general applicability of the model can be improved by further data mining of the impact indicators, to predict the regional water consumption more accurately and reasonably.

Supplementary Information

Below is the link to the electronic supplementary material.

Author contribution

Zhicheng Yang: writing-original draft and formal analysis. Bo Li: writing-review and editing and project administration. Huang Wu: software, investigation, and writing-original draft. MengHua Li: writing-review and editing and software. Juan Fan: investigation and formal analysis. Mengyu Chen: writing-review and editing. Jie Long: data analysis and investigation.

Funding

This research was financially supported by Natural Science Foundation (42162022; 41702270), Guizhou Province Excellent Youth Science and Technology Talent Project (Qian Ke He Ping Tai Ren Cai [2021]5626), Guizhou Science and Technology Department Project (Qian Ke He Ji Chu [2019]1413; Qian Ke He Zhi Cheng [2020]4Y048; Qian Ke He Zhi Cheng [2020]4Y007; Qian Ke He Zhi Cheng[2020]4Y005), Guizhou Provincial Department of Education Foundation ([2018]113), Shanxi Province Coal Mine Water Hazard Prevention and Control Technology Open Fund (2020SKMS01).

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Declarations

Ethics approval

Not applicable. This manuscript does not involve researching about humans or animals.

Consent to participate

All of the authors consented to participate in the drafting of this manuscript.

Consent for publication

All of the authors consented to publish this.

Conflict of interest

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Almanjahie IM, Chikr-Elmezouar Z, Bachir A (2019) Modeling and forecasting the household water consumption in Saudi Arabia. Appl Ecol Environ Res 17(1):1299–1309. 10.15666/aeer/1701_12991309
  2. Azimi S, Azhdary M, Hashemi MSA. Prediction of annual drinking water quality reduction based on Groundwater Resource Index using the artificial neural network andfuzzy clustering. J Contam Hydrol. 2018;220(2019):6–17. doi: 10.1016/j.jconhyd.2018.10.010. [DOI] [PubMed] [Google Scholar]
  3. Castura JC, Rutledge DN, Ross CF, Naes T. Discriminability and uncertainty in principal component analysis (PCA) of temporal check-all-that-apply (TCATA) data. Food Qual Prefer. 2022;96:104370. doi: 10.1016/j.foodqual.2021.104370. [DOI] [Google Scholar]
  4. Chen H, Zhang Y, Ma L, Liu F, Zheng W, Shen Q, Zhang H, Wei X, Tian D, He G. Change of water consumption and its potential influential factors in Shanghai: a cross-sectional study. BMC Public Health. 2012;12(1):1–9. doi: 10.1186/1471-2458-12-450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen G, Long T, Xiong J, Bai Y. Multiple random forests modelling for urban water consumption forecasting. Water Resour Manage. 2017;31(15):4715–4729. doi: 10.1007/s11269-017-1774-7. [DOI] [Google Scholar]
  6. Chen MT, Luo YF, Shen YY, Han ZZ, Cui YL. Driving force analysis of irrigation water consumption using principal component regression analysis. Agric Water Manag. 2020;234:106089. doi: 10.1016/j.agwat.2020.106089. [DOI] [Google Scholar]
  7. Dos Santos CC, Pereira Filho AJ. Water demand forecasting model for the metropolitan area of So Paulo, Brazil. Water Resources Management. 2014;28(13):4401–4414. doi: 10.1007/s11269-014-0743-7. [DOI] [Google Scholar]
  8. Duan C, Chen B. Driving factors of water-energy nexus in China. Appl Energy. 2020;257:113984. doi: 10.1016/j.apenergy.2019.113984. [DOI] [Google Scholar]
  9. Fan L, Gai L, Tong Y, Li R. Urban water consumption and its influencing factors in China: evidence from 286 cities. J Clean Prod. 2017;166:124–133. doi: 10.1016/j.jclepro.2017.08.044. [DOI] [Google Scholar]
  10. Guizhou Water Resources Bulletin (2020) Guizhou Provincial Department of Water Resources 2020. China Water Resources and Hydropower Press, Beijing. http://www.gzmwr.gov.cn/slgb/slgb1/
  11. Guizhou Province Statistical Yearbook (2020) Guizhou Provincial Bureau of Statistics 2020. China Statistics Press, Beijing. http://stjj.guizhou.gov.cn/tjsj_35719/sjcx_35720/gztjnj_40112/
  12. Hao W, GuiYu Y, YangWen J, DaYong Q, Hong G, JianHua W, ChunMiao H. Necessity and feasibility for an ET-based modern water resources management strategy: a case study of soil water resources in the Yellow River Basin. Science in China Series E. 2009;10(52):3004–3016. doi: 10.1007/s11431-009-0102-8. [DOI] [Google Scholar]
  13. He F, Tao T. An improved coupling model of grey-system and multivariate linear regression for water consumption forecasting. Pol J Environ Stud. 2014;23(4):1165–1174. [Google Scholar]
  14. Heo G, Gader P, Frigui H. RKF-PCA: robust kernel fuzzy PCA. Neural Netw. 2009;22(5–6):642–650. doi: 10.1016/j.neunet.2009.06.013. [DOI] [PubMed] [Google Scholar]
  15. Jia D, Wu Z. Intelligent evaluation system of government emergency management based on BP neural network. IEEE Access. 2020;8:199646–199653. doi: 10.1109/access.2020.3032462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Keshavarzi AR, Sharifzadeh M, Haghighi A, Amin S, Keshtkar S, Bamdad A. Rural domestic water consumption behavior: a case study in Ramjerd area Fars province, I.R. Iran. Water Res. 2006;40(6):1173–1178. doi: 10.1016/j.watres.2006.01.021. [DOI] [PubMed] [Google Scholar]
  17. Li B, Zhang HL, Long J, Fan J, Wu P, Chen MY, Liu P, Li T. Migration mechanism of pollutants in karst groundwater system of tailings impoundment and management control effect analysis: gold mine tailing impoundment case. J Clean Prod. 2022;350:131434. doi: 10.1016/j.jclepro.2022.131434. [DOI] [Google Scholar]
  18. Lili Z, Weijian R, Liqun S, Fengcai H. Well logging prediction and uncertainty analysis based on recurrent neural network with attention mechanism and Bayesian theory. J Petrol Sci Eng. 2021;208(2022):109458. doi: 10.1016/j.petrol.2021.109458. [DOI] [Google Scholar]
  19. Liu RX, Kuang J, Gong Q, Hou XL. Principal component regression analysis with spss. Comput Methods Programs Biomed. 2003;71(2003):141–147. doi: 10.1016/S0169-2607(02)00058-5. [DOI] [PubMed] [Google Scholar]
  20. Liu ZJ, Li B, Chen M, Li T. Evaluation on sustainability of water resource in karst area based on the emergy ecological footprint model and analysis of its driving factors: a case study of Guiyang city. China Environmental Science and Pollution Research. 2021;28(35):49232–49243. doi: 10.1007/s11356-021-14162-4. [DOI] [PubMed] [Google Scholar]
  21. Lopez Farias R, Puig V, Rodriguez Rangel H, Flores JJ. Multi-model prediction for demand forecast in water distribution networks. Energies. 2018;11(3):660. doi: 10.3390/en11030660. [DOI] [Google Scholar]
  22. Nosvelli M, Musolesi A. Water consumption and long-run socio-economic development: an intervention and a principal component analysis for the city of Milan. Environmental Modellline & Assessment. 2009;14(3):303–314. doi: 10.1007/s10666-007-9127-1. [DOI] [Google Scholar]
  23. Piasecki A, Jurasz J, Marszelewski W. Application of multilayer perceptron artificial neural networks to mid-term water consumption forecasting - a case study. Ochrona Srodowiska. 2016;38(2):17–22. [Google Scholar]
  24. Pu W, Yun B, Chuan L, Ying W, Jingjing X (2015) Urban daily water consumption forecasting based on variable structure support vector machine. J Basic Sci Eng 23(5):895–901. 10.16058/j.issn.1005-0930.2015.05.005
  25. Romano G, Salvati N, Guerrini A. An empirical analysis of the determinants of water demand in Italy. Journal of Cleaner Production. 2016;130(sep.1):74–81. doi: 10.1016/j.jclepro.2015.09.141. [DOI] [Google Scholar]
  26. Sandiford P, Gorter AC, Orozco JG, Pauw JP. Determinants of domestic water use in rural Nicaragua. Journal of Tropical Medicine & Hygiene. 1990;93(6):383. [PubMed] [Google Scholar]
  27. Sebri M. ANN versus SARIMA models in forecasting residential water consumption in Tunisia. J Water Sanit Hyg Dev. 2013;3(3):330–340. doi: 10.2166/washdev.2013.031. [DOI] [Google Scholar]
  28. Sivapalan M, Savenije HHG, Bloeschl G. Socio-hydrology: a new science of people and water. Hydrol Process. 2012;8(26):1270–1276. doi: 10.1002/hyp.8426. [DOI] [Google Scholar]
  29. Statistical Bulletin of National Economic and Social Development of Guizhou Province (2020) Guizhou Provincial Bureau of Statistics 2020. China Statistics Press, Beijing. https://www.guizhou.gov.cn/zwgk/zfsj/tjgb/202109/t20210913_70088474.html
  30. Wang Q, Su M. A preliminary assessment of the impact of COVID-19 on environment? A case study of China. Sci Total Environ. 2020;728:138915. doi: 10.1016/j.scitotenv.2020.138915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wang Q, Zhan LN. Assessing the sustainability of renewable energy: an empirical analysis of selected 18 European countries. Sci Total Environ. 2019;629:529–545. doi: 10.1016/j.scitotenv.2019.07.170. [DOI] [PubMed] [Google Scholar]
  32. Wang Q, Li S, Li R. Evaluating water resource sustainability in Beijing, China: combining PSR model and matter-element extension method. J Clean Prod. 2018;206:171–179. doi: 10.1016/j.jclepro.2018.09.057. [DOI] [Google Scholar]
  33. Wang Q, Li SY, Li RR. Forecasting energy demand in China and India: using single-linear, hybrid-linear, and non-linear time series forecast techniques. Energy. 2018;161:821–831. doi: 10.1016/j.energy.2018.07.168. [DOI] [Google Scholar]
  34. Wang Q, Li SY, Li RR, Jiang F. Underestimated impact of COVID-19 on carbon emission reduction in developing countries-a novel assessment based on scenario analysis. Environ Res. 2021;204:111990. doi: 10.1016/j.envres.2021.111990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wu J, Wang Z, Dong L. Prediction and analysis of water resources demand in Taiyuan City based on principal component analysis and BP neural network. Journal of Water Supply: Research and Technology. 2021;70(8):1272–1286. doi: 10.2166/aqua.2021.205. [DOI] [Google Scholar]
  36. Xu ZJ, Zhang Y, Xiao Y. Training behavior of deep neural network in frequency domain. arXiv-CS-Information Theory. 2018;11953:264–274. doi: 10.1007/978-3-030-36708-4_22. [DOI] [Google Scholar]
  37. Xu X, Cao D, Zhou Y, Gao J. Application of neural network algorithm in fault diagnosis of mechanical intelligence. Mech Syst Signal Process. 2020;141:106625. doi: 10.1016/j.ymssp.2020.106625. [DOI] [Google Scholar]
  38. Zhu W, Wang H, Zhang X. Synergy evaluation model of container multimodal transport based on BP neural network. Neural Comput Appl. 2021;9(32):4087–4095. doi: 10.1007/s00521-020-05584-1. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data generated or analyzed during this study are included in this published article and its supplementary information files.


Articles from Environmental Science and Pollution Research International are provided here courtesy of Nature Publishing Group

RESOURCES