Abstract
The excavation of subterranean coal has led to a plethora of ecological and environmental issues, which seriously restrict the sustainable development of society. As one of the important physical indicators of soil, soil moisture content needs to be scientific, real-time, and comprehensively monitored. Due to the low efficiency of manual measurement, methods based on remote sensing data inversion have received widespread attention and in-depth research in recent years. In this study, a new ReMPDI index (Red edge Modified Perpendicular Drought Index) is constructed, and six retrieval models of soil moisture content based on machine learning algorithms are compared and analyzed, and the accuracy is verified by measured sampling data. The following conclusions were obtained: (1) Using the red edge band as the horizontal axis, and the near infrared band NIR as the vertical axis is the optimal spatial band combination of spectral characteristics for constructing soil lines; (2) The determination coefficient (R2) of ReMPDI index based on REdge-NIR spectral feature space and adding vegetation cover factor is the highest, which is-0. 798, and there is a significant correlation, which is better than MPDI and PDI index; (3) The model inversion accuracy of the RF is significantly higher than SVM, BPNN, PLSR, CNN, and RBFNN, with an error of only 9.52% compared to the measured results. The results of this study can provide a theoretical basis and technical support for the fine monitoring of surface soil moisture content on a large scale in mining areas.
Keywords: Soil moisture, Inversion model, PDI, UAV, Mining area
Subject terms: Environmental impact, Computer science, Hydrology
Introduction
The development and utilization of coal play a crucial role in social stability and economic development, but at the same time, it inevitably brings a series of ecological and environmental problems1. Especially the physicochemical properties of surface soil, which undergo significant changes due to the influence of coal mining2–4. Environmental problems caused by coal mining seriously restrict the sustainable development of society. Therefore, it is necessary to conduct scientific, real-time, and comprehensive monitoring of the soil in coal mining subsidence areas5,6. Soil moisture content, as one of the important physical properties of soil, is a key parameter for hydrological cycling and energy exchange between the surface and the atmosphere, and has significant implications for the surface ecological environment7.
At present, the monitoring of soil moisture content mainly relies on two methods: manual field sampling and remote sensing image inversion. The manual measurement method is simple to operate and has high accuracy, but it is time-consuming and labor-intensive, and cannot meet the real-time and large-scale monitoring requirements of soil moisture content. In recent years, with the improvement of remote sensing data resolution and the rapid development of monitoring methods, the inversion method of soil moisture content based on remote sensing image data has received increasing attention8. By obtaining soil spectral data through remote sensing technology and establishing a quantitative relationship model between soil moisture content and spectral data, the quantitative estimation of soil moisture content can be achieved9. However, due to the low resolution of satellite remote sensing, the inversion accuracy of soil moisture content is limited. The UAV, with its advantages of flexibility, high resolution, and low operating cost, can be used as an ideal data source for soil moisture content inversion10.
The drought index is an indicator reflecting the degree of dryness and wetness, which is widely used in monitoring soil moisture content. As early as 2007, Zhan et al. proposed a perpendicular drought index (PDI)11 by constructing a NIR-Red feature space, which can better invert the soil moisture content. Subsequently, Ghulam et al. introduced a vegetation coverage factor and constructed a modified perpendicular drought index MPDI12 effectively reducing the impact of vegetation on the inversion accuracy of soil moisture content, and achieving good application results in areas with abundant vegetation13. In addition, some scholars have many drought indexes such as CDI (Combined Drought Index)14, TVDI (Temperature Vegetation Dryness Index)15, VSDI (Visible optical and Short-wave infrared Drought Index)16, TVMDI (Temperature-Vegetation-soil Moisture Dryness Index)17, VAPDI (Vegetation Adjusted Perpendicular Drought Index)4, etc., which have effectively estimated soil moisture content in different research areas. However, due to the complex surface geological environment of coal mining subsidence areas, the disorderly distribution of vegetation, the topography of high and low elevations, and the location and speed of underground coal mining, all of which can affect the surface soil moisture content. How to improve and optimize the index suitable for mining areas remains to be further studied18. Existing studies usually use red band when constructing drought index and soil line. However, the red band is mainly used to observe the color and reflection characteristics of plant leaves, and is usually used for plant classification and recognition. The red edge is more related to the growth of vegetation. As we know, soil moisture is more closely related to the growth of vegetation. Therefore, when we construct the drought index, we use red edge instead of red band, and the correlation between drought index and soil moisture can be further enhanced.
Based on the selection of appropriate drought indices, it is still necessary to construct suitable inversion models to achieve effective prediction of soil moisture content19. In recent years, with the rapid development of machine learning methods and the advantages of intelligence, many scholars have applied them to the construction of soil moisture inversion models20. Bertalan et al.21 compared the inversion effects of four machine learning algorithms, including RF (Random Forest), EN (Elastic Network Regression), GLM (General Linear Model), and RLM (Robust Linear Model), on soil moisture content, and found that RF is the best model when using multispectral data. Adab et al.22 compared the RF, ENR, SVM (Support Vector Machine), and ANN (Artificial Neural Network) algorithms, and also found that RF has the highest accuracy, reaching more than 73%. In addition, partial least squares algorithm23, GA-BP (Genetic Algorithm Back Propagation) neural network algorithm24, SEM-ANN (Structural Equation Models-Artificial Neural Network) algorithm25, and constrained multi-channel algorithm CMCA (Constrained Multi-Channel Algorithm)26 have also been applied to the construction of soil moisture inversion models. With the continuous development of remote sensing technology, the inversion method of soil moisture content based on remote sensing images has become increasingly mature, but there are certain differences in accuracy between different models27. Therefore, for the geological environment characteristics of coal mining subsidence areas in mining areas, it is of great significance to obtain a stable and highly accurate soil moisture estimation model by comparing and analyzing the accuracy differences between different models for monitoring and mapping the distribution of soil moisture content in coal mining subsidence areas, as well as for land reclamation and vegetation restoration in mining areas.
In view of this, to explore a real-time, efficient and low-cost monitoring method of surface soil moisture in mining area, a new ReMPDI index (Red edge Modified Perpendicular Drought Index) is constructed using UAV image data, and six inversion models of soil moisture based on machine learning algorithms (SVM, BPNN, PLSR, CNN, RF, RBFNN) are compared and analyzed. The accuracy is verified by measured sampling data, and the best index and inversion model are finally selected. At the same time, combined with mining data, we analyzed the influence of coal mining on surface soil moisture content in mining areas, which provided theoretical basis and technical support for fine monitoring and treatment of large-scale surface soil moisture content in arid and semi-arid mining areas.
Materials and methods
Data sources and research methods
Data sources
The research area is located in the Erlintu Coal Mine in Inner Mongolia Autonomous Region, China, which belongs to the arid and semi-arid regions. Its geographical location is shown in Fig. 1. Figure 1 is the result of geographic data processing and mapping using Esri’s ArcGIS Pro software (version 3.4.0; Esri, 2024, URL: https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview), including boundary extraction, coordinate system transformation, and thematic map rendering28. The vegetation in the mining area is mainly composed of typical grassland vegetation and sandy vegetation, mainly distributed with coarse soil, sandy soil, and chestnut soil, and the land shows a trend of desertification. Due to factors such as coal mining and climate conditions, the exposed surface area increases and water loss intensifies. The parameters of the UAV images in the mining area are shown in Table 1.
Fig. 1.
Geographical location and topographic map of the study area.
Table 1.
Parameter information of UAV images.
| Parameter | Value |
|---|---|
| Data type | Multispectral image |
| Flight date | May 8th, 2023 |
| Flight altitude | 80 m |
| UAV model | M210RTK |
| Camera model | MS600pro |
| Focal length | 6 mm |
| Band | 450 nm, 555 nm, 660 nm, 710 nm, 840 nm and 940 nm |
| Ground spatial resolution (GSD) | 5 cm |
Research method
This research takes the surface soil moisture content in mining area as the research object, obtains the surface soil moisture content sampling measured data and UAV remote sensing image data. Firstly, the perpendicular drought index is improved, and a new ReMPDI index is constructed by using infrared band. Then, through correlation analysis, the best modified perpendicular drought index was obtained through band optimization and correlation analysis. Finally, based on the machine learning algorithm, six inversion models of soil moisture content are constructed, and the accuracy is verified with the measured results. The technical roadmap is shown in Fig. 2.
Fig. 2.
Technical roadmap.
Field sampling of soil moisture content
Underground coal mining in mining areas can cause changes in the moisture content of surface soil29. In this study, surface soil samples were collected from different mining years and unmined areas of the Erlintu Coal Mine. The distribution map of soil sampling points is shown in Fig. 3. The satellite image in Fig. 3 was created using free data from 91weitu software (https://www.91weitu.com/) and the coordinates of the mining face30. Among them, M1-M5 is the surface sampling area affected by coal mining, and U1-U3 is the control area not affected by coal mining. Sampling method: The five-point sampling method is used, with a soil depth of 10 cm. 5 sample data were collected at each point, and 45 samples were collected at each working face. At the same time, the geographical coordinates of the sampling point are located using a handheld GPS. The soil moisture content is calculated using an indoor experimental drying method, and the average value is taken as the soil moisture content for that sampling point. The distance between each working face is not large, and the geological conditions and soil conditions are similar, which can be used for statistical analysis.
Fig. 3.
Soil Sampling Area of Erlintu Coal Mine. M1-M5 is the surface sampling area affected by coal mining, and U1-U3 is the control area not affected by coal mining.
The construction method of ReMPDI index
When the soil is under stress, the reflectivity of red and near-infrared bands will increase. According to research, the scatter plot of reflectivity of red and near-infrared bands presents a shape similar to a triangle. Based on this phenomenon, Zhan et al. proposed the PDI11 to describe the dry and wet state of the soil. The principle is shown in Fig. 4. L represents the normal of the soil baseline, and line BC represents the soil line; PDI is the perpendicular line of line L, which is used to describe the distribution law of water content in the Nir-Red spectral feature space. The closer it is to the line L, the higher the water content, and the farther it is, the lower the soil water content. Assuming that point E is selected, the distance EF from this point to line L, which is the Perpendicular Drought Index PDI, is calculated. The key to calculating the Perpendicular Drought Index is the construction of spectral feature space and the extraction of soil lines. PDI is calculated as shown in Formula (1):
![]() |
1 |
Fig. 4.

Reflectance eigenspaces at the red band and in the near-infrared band.
Where,
is the reflectance of the horizontal band in the spectral feature space;
is the reflectance of the perpendicular band in the spectral feature space; M is the slope of the soil line.
PDI is applicable to surfaces without vegetation or with low vegetation coverage. Since vegetation itself contains water, its coverage can interfere with the calculation of PDI. In order to monitor the dry and wet conditions of the surface when vegetation coverage is high, Ghulam et al. introduced the fractional vegetation cover (FVC) to improve PDI and obtain the MPDI12. This enables MPDI to better adapt to the needs of soil moisture monitoring in areas with abundant vegetation. The calculation methods of FVC and MPDI are shown in formulas (2) and (3), respectively:
![]() |
2 |
![]() |
3 |
and
are empirical parameters that can be considered as fixed values for the reflectance of vegetation in the red and near-infrared bands, respectively.
Research has shown that the red edge band has extremely high sensitivity in vegetation monitoring31. In areas with vegetation, the red edge peak can be used as an indicator to assess severe water stress. In order to effectively reduce the disturbance of vegetation on soil moisture content in mining area, a new ReMPDI was constructed by using the Red Edge band. The calculation method is as shown in Formula (4):
![]() |
4 |
and
are set to 0.05 and 0.5, respectively.
Finally, the index is optimized by correlation analysis. The Pearson correlation coefficient, Spearman correlation coefficient and Kendall correlation coefficient were calculated by SPSS software, and the variance expansion factor (VIF) and tolerance (Tol) of collinearity analysis indexes were used to optimize the best index.
Construction and accuracy verification of soil moisture inversion model based on ReMPDI
In this study, a total of 360 sets of data on soil moisture content were obtained through field sampling. By random selection method, it is divided into 80% training set and 20% verification set.
By using the optimal modified perpendicular drought index obtained above and 290 sets of soil moisture content field sampling data as training model input factors, six machine learning algorithms including support vector machine (SVM), backpropagation neural network (BPNN), partial least squares regression (PLSR), convolutional neural network (CNN), random forest (RF), and radial basis function neural network (RBFNN) were used to construct a soil moisture content inversion model based on the perpendicular drought index. The accuracy was verified using 70 sets of validation data sets.
The regression model is introduced as follows:
(1) Support Vector Machine (SVM) is a widely used machine learning algorithm for classification and regression problems32. Its core principle is to map data into a high-dimensional space, thereby achieving linear segmentation of data. In high-dimensional space, Support Vector Machine (SVM) is a method of classification by finding a hyperplane with the largest margin. In this study, we used the e-SVR support vector regression model33 and set the p-value of the loss function to 0.01. The kernel function was set to PK and optimized using the parameters of the training set. The values of the other two important parameters in SVM: the penalty parameter and the kernel parameter were determined according to the principle of minimizing the mean square error.
(2) Back Propagation Neural Network (BPNN) is a common artificial neural network and a supervised learning algorithm3, which can be used for classification, regression, and prediction tasks. BP neural network consists of an input layer, a hidden layer, and an output layer34. The input layer receives data, the hidden layer processes data, and the output layer publishes results. The training of the BP neural network uses a backpropagation algorithm. Compared with traditional data processing methods, the BP neural network has many advantages, including but not limited to handling nonlinear problems, strong adaptability, and generalization ability.
(3) Partial Least Squares Regression (PLSR) is a multivariate statistical method35 that combines principal component analysis and linear regression to analyze the correlation between two sets of variables and achieve regression modeling. Unlike traditional multivariate regression methods, PLSR performs dimensionality reduction on both independent and dependent variables, thereby reducing the correlation between independent variables and improving the predictive power of the model. The basic idea of PLSR is to project both independent and dependent variables into a new low-dimensional space, maximizing the correlation between the projected independent and dependent variables36.
(4) Convolutional Neural Networks (CNN) are essentially a multi-layer perceptron37. Convolutional Neural Networks (CNN) are a type of deep neural network with a convolutional structure that reduces the memory footprint of deep networks. Its three key operations, namely local receptive field, weight sharing, and pooling layer, can effectively reduce the number of network parameters, thereby alleviating the problem of overfitting in the model. Convolutional Neural Networks differ from conventional neural networks in that their neurons are arranged in a three-dimensional manner, including width, height, and depth38.
(5) Random Forest (RF) is a classification and regression algorithm proposed by Breiman and Cutler in 200139. It is an ensemble learning method in which each decision tree is treated as a classifier that classifies input data to predict output results. During the training process of the random forest model, each decision tree is trained on different random samples and random features, thereby reducing the risk of overfitting and improving the generalization ability of the model. In the prediction stage, the random forest model performs statistical voting on the prediction results of each decision tree and selects the result with the highest number of votes as the final prediction result40.
(6) Radial Base Function Neural Network (RBFNN) is a special type of neural network41. RBFNN has only one hidden layer, and the neurons in this hidden layer use radial basis function (RBF) as their activation function. RBF neural networks have wide applications in many fields, such as pattern recognition, function approximation, regression modeling, and time series prediction. In this study, we choose to use the widely used Gaussian function as a typical radial basis function (RBF)42.
The accuracy evaluation indicators of the model are measured using the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE)40. When R2 is close to 1 and RMSE and MAE are small, it indicates that the model has a good prediction effect. The calculation formulas for these evaluation indicators are as follows:
![]() |
5 |
![]() |
6 |
![]() |
7 |
where,
is the predicted value of soil moisture content;
is the measured value of soil moisture content;
is the average value of the measured values of soil moisture content; n is the number of samples.
Results
The results of index correlation analysis
Taking M1 sampling area as an example, the reflectivity information of UAV image is transformed into spectral feature space by using ENVI 5.3 software. Among them, the spectral feature space of PDI and MPDI takes the red band as the horizontal axis and the near infrared band as the vertical axis. The spectral feature space of ReMPDI takes the red edge band as the horizontal axis and the near infrared band as the vertical axis. The spatial distribution of spectral characteristics of PDI, MPDI and ReMPDI is shown in Fig. 5. It can be seen that the spectral feature space R2 of ReMPDI is the largest, which is 0.924.
Fig. 5.
Spectral Characteristic Spaces of Three Exponents in M1 Sampling Region.
In this study, there are 8 sampling areas, and the spatial distribution map of spectral characteristics is drawn for each sampling area, and the fitted R2 is counted. The results are shown in Table 2. It can be seen that the mean R2 of ReMPDI is obviously higher than that of PDI and MPDI. Furthermore, compared to the control areas, the factors affecting soil moisture content in coal mining-affected areas are more complex. For example, coal mining can cause surface subsidence, leading to soil compaction and changes in soil moisture content. Therefore, the correlation between soil moisture content and ReMPDI index in coal mining-affected areas is lower.
Table 2.
Summary of soil line slopes and fitted R2 values for one-dimensional linear fits.
| R 2 | M1 | M2 | M3 | M4 | M5 | U1 | U2 | U3 | Mean |
|---|---|---|---|---|---|---|---|---|---|
| PDI | 0.814 | 0.915 | 0.912 | 0.875 | 0.870 | 0.827 | 0.821 | 0.833 | 0.858 |
| MPDI | 0.873 | 0.932 | 0.849 | 0.921 | 0.877 | 0.853 | 0.838 | 0.862 | 0.875 |
| ReMPDI | 0.924 | 0.822 | 0.905 | 0.928 | 0.868 | 0.944 | 0.915 | 0.842 | 0.894 |
The statistical results of Pearson correlation coefficient, Spearman correlation coefficient and Kendall correlation coefficient, as well as variance expansion factor (VIF) and tolerance (Tol) of collinearity analysis indexes are shown in Table 3.
Table 3.
Statistical table of indicators of correlation between perpendicular dryness index and measured values.
| Index | Pearson | Spearman | Kendall | VIF | Tol |
|---|---|---|---|---|---|
| PDI | −0.568 | −0.679** | −0.487 | 12.943 | 0.072 |
| MPDI | −0.662** | −0.766** | −0.582** | 12.265 | 0.078 |
| ReMPDI | −0.798** | −0.883** | −0.724** | 3.114 | 0.321 |
Note: ** indicates a significant test p < 0.01.
As a whole, all kinds of vertical drought indices have negative correlation with soil moisture content, and the size of the indices can represent the size of soil moisture content.
According to Poisson Piersmann and Kendall correlation coefficients MPDI and ReMPDI with vegetation cover factors have higher correlation with measured soil moisture than PDI and have better characterization ability for surface soil moisture. Among them, the correlation between ReMDPI and soil moisture was the highest, Pearson correlation coefficient, Spearman correlation coefficient and Kendall correlation coefficient reached-0.798, −0.883 and-0.724, respectively.
Generally, collinearity exists when the variance expansion factor (VIF) is greater than 10 or the tolerance (Tol) is less than 0.1. It can be seen from Table 4 that the VIF index of MDPI4 is the lowest among the 10 vertical drought indexes, which is 3.114, and the Tol index is the highest, which is 0.321.
Table 4.
Training set, validation set, and cross-validation result statistics.
| Model algorithm |
Training set | Validation set | Cross validation | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
MAE |
|
|
MAE |
|
|
MAE | |
| SVM | 0.686 | 1.488 | 0.987 | 0.692 | 1.192 | 0.780 | 0.693 | 1.436 | 0.947 |
| BPNN | 0.693 | 1.438 | 1.014 | 0.744 | 1.302 | 0.965 | 0.703 | 1.412 | 1.004 |
| PLSR | 0.625 | 1.646 | 1.247 | 0.721 | 1.288 | 0.683 | 0.627 | 1.583 | 1.137 |
| CNN | 0.664 | 1.464 | 0.990 | 0.730 | 1.476 | 1.105 | 0.679 | 1.467 | 1.013 |
| RF | 0.740 | 1.326 | 0.917 | 0.882 | 0.845 | 0.653 | 0.768 | 1.247 | 0.866 |
| RBFNN | 0.708 | 1.431 | 1.001 | 0.753 | 1.121 | 0.831 | 0.718 | 1.376 | 0.968 |
Based on the above analysis, it can be concluded that the ReMPDI proposed in this study is the best index applied to the soil moisture inversion model in this study area.
Optimization and accuracy verification results of soil moisture content inversion model
Based on ReMPDI index and measured soil data as input factors of model training, six machine learning algorithms, namely support vector machine (SVM), back propagation neural network (BPNN), partial least square method (PLSR), convolution neural network (CNN), random forest (RF) and radial basis function neural network (RBFNN), are used to construct soil moisture inversion model based on vertical drought index. The training set, verification set and cross-verification results are shown in Table 4. The scatter plot of measured and predicted values is shown in Fig. 6.
Fig. 6.
Scatter plot of predicted values of six machine learning models and measured values of soil moisture content. (a) is the SVM prediction result, (b) is the BPNN prediction result, (c) is the PLSR prediction result, (d) is the CNN prediction result, (e) is the RF prediction result, and (f) is the RBFN prediction result.
The cross-validation method used in this study is the leave-one-out cross-validation. The leave-one-out cross-validation is used to divide a large dataset, including training and validation sets, into k small datasets. Then, k-1 is used as the training set, and the other is used as the test set. Then, select the next one as the test set, and the remaining k-1 as the training set. By analogy, the k classification accuracy is obtained and the average value is taken as the final classification accuracy of the dataset.
As shown in Table 4; Fig. 6, among the six machine learning algorithms used to construct the inversion model, the RF model achieved excellent results in the modeling process, with the R2 of 0.740 in the training set, the RMSE of 1.326, and the MAE of only 0.917. On the validation set, the RF model had the best prediction results, with the R2 of 0.8819, the RMSE of 0.8450, and the MAE of 0.6531. From the cross-validation results of the six machine learning modeling predictions, the R2 of the RF model was higher than that of the other five models, and both RMSE and MAE were lower than those of the other five models. Figure 7 is a box plot of the percentage error between measured and predicted values in the training and testing sets.
Fig. 7.
The box plot of the percentage error between measured and predicted values in the training and testing sets. (a) is training set and (b) is testing set.
Meanwhile, six machine learning models were used to predict soil moisture content. The error statistics box plot of the true value and the predicted value is shown in Fig. 8. The average error of the BPNN, CNN, PLSR, RBFNN, RF, and SVM prediction models was 14.19%, 18.75%, 13.56%, 13.32%, 9.52%, and 19.24%, respectively. The RF machine learning model had the smallest prediction error of only 9.52%. Based on the above analysis, the RF-based soil moisture inversion model is the best in this study area.
Fig. 8.
Statistical box plots of errors in predicting soil moisture content using six machine learning models.
Soil moisture content inversion results based on UAV images
The soil moisture content of 8 sampling areas was inversed, as shown in Fig. 9. The statistical results showed that the surface soil moisture content of each sampling area in the study area ranged from 1.71 to 12.13%, with an average value of 5.07% and a standard deviation of 2.36%. Table 5 presents the relevant statistical characteristics of the obtained measured samples and inversion results moisture content data. It can be seen from the table that the measured samples soil moisture content ranges from 1.82 to 12.33%, with an average of 5.22% and a standard deviation of 2.59%. This is consistent with the descriptive statistical results of the measured soil samples in the study area, proving that the soil moisture content inversion model based on the modified perpendicular drought index extracted in this study is reliable.
Fig. 9.
Inverse map of SMC in the sampling area. Among them, (a) - (h) are the research areas M1-M5 and U1-U3, respectively.
Table 5.
Table of relevant statistical characteristics of measured samples and inversion results moisture content data.
| Statistical characteristics | SMC Average value (%) | SMC Maximum value (%) | SMC Minimum value (%) | SMC standard deviation |
|---|---|---|---|---|
| Measured samples | 5.22 | 12.33 | 1.82 | 2.59 |
| Inversion results | 5.07 | 12.13 | 1.71 | 2.36 |
By comparing the soil moisture content in each sampling area, it can be seen that the surface soil moisture content in the sampling area that has been disturbed by mining is lower than that in the sampling area that has not been disturbed by mining to a certain extent, which is consistent with the actual situation and further proves that the soil moisture content inversion model based on the modified perpendicular drought index proposed in this study is reliable.
Analysis and discussion
The mining of underground coal will lead to the collapse of the overlying rock strata, affecting the physical and chemical properties of the surface soil, especially the soil moisture content, causing a series of ecological and environmental problems. Therefore, it is necessary to conduct scientific, real-time, and comprehensive monitoring of the soil in the coal mining subsidence area. Considering factors such as monitoring costs and data accuracy, drone images can be used as an ideal data source to meet the monitoring needs of the soil in the mining area. In recent years, the inversion of soil moisture content through remote sensing image data has received widespread attention from scholars and a series of studies have been conducted. Since the concept of soil line was proposed, it has been applied in indexes such as PDI, MDPI, and SPSI to effectively invert soil moisture content.
At present, the establishment of soil lines is mainly based on visible red and near-infrared bands. However, due to the presence of vegetation on the surface of the mining area, and the red edge band has been proven to be highly sensitive to vegetation, this study attempts to explore the possibility of using the red edge band instead of the visible red band to construct soil lines. Based on the vegetation coverage factors and soil line constructed in Red Edge band and Near Infrared band, a novel ReMPDI index is constructed. Through correlation analysis, it is found that ReMPDI is obviously superior to MPDI and PDI. This also shows from the side that the red edge band can better reduce the interference of vegetation to soil moisture content in areas with more vegetation.
In terms of the inversion model for soil moisture content, the random forest algorithm achieves higher accuracy compared to other algorithms. Random forest is an ensemble learning method in which each decision tree is treated as a classifier that predicts output results by classifying input data. During the training process of the random forest model, each decision tree is trained on different random samples and random features, thereby reducing the risk of overfitting and improving the generalization ability of the model. The ensemble learning method can solve the inherent defects of a single model or a group of parameter models, thereby integrating more models. At the same time, due to the sensitivity of support vector machines to missing data and the choice of parameters and kernel functions, the choice of kernel function is important and requires repeated parameter adjustments. The disadvantage of BP neural networks is that they are prone to falling into local optima, so multiple training and parameter adjustments are required. The model established by PLSR may be over-fitted. CNN uses a gradient descent algorithm, which is prone to convergence to local minima rather than global minima, and the pooling layer loses a lot of valuable information, ignoring the correlation between local and global. RBFNN has high requirements for training samples, requiring sufficient coverage of the input space and including sufficient sample size. Therefore, the random forest algorithm achieves the best accuracy in the inversion of soil moisture content, which is consistent with the research results of Bertalan21 and Adab22.
The advantage of random forests in soil moisture remote sensing inversion lies in their adaptability to non-linear, high noise, and small sample data, perfectly matching the needs of remote sensing inversion in mining areas. However, there are still certain limitations in its application in remote or resource-limited mining regions. The mining areas are mostly located in remote areas with inconvenient transportation, resulting in high ground sampling costs and the inability to obtain large-scale measured samples. The lack of sample representativeness directly affects the accuracy of the model. In addition, in terms of computational cost, the training phase requires high computing power, and the total computational load increases linearly with the number of samples and features. When making predictions, it is necessary to traverse all decision trees. Although single sample prediction is fast, the computational complexity is still large when performing batch inversion of global images. By optimizing sampling strategies, simplifying model parameters, selecting low-cost data sources, and balancing computational costs and inversion accuracy, feasible technical solutions are provided for water resource management and ecological protection in mining areas.
The current research on the ReMPDI index method in remote sensing inversion of soil moisture mainly focuses on the surface of arid and semi-arid mining areas. Further research is needed to explore the applicability of the model under humid and hot climates, high vegetation cover, and frequent precipitation conditions. Through experiments under multiple climatic conditions, the scope of the method’s applicability and areas for improvement need to be clarified to enhance its universality. In addition, existing research may mainly rely on optical remote sensing data for soil moisture inversion, and in the future, the integration of multi-source information such as radar remote sensing data and thermal infrared data can be explored. There are multiple sources of uncertainty in the inversion process, such as remote sensing data noise, model structure uncertainty, etc. In the future, it is necessary to conduct in-depth research on uncertainty quantification methods, such as using Monte Carlo simulation, Bayesian inference, and other methods to evaluate the uncertainty range of inversion results.
Conclusion
In this study, a new ReMPDI index is constructed for surface soil moisture content in arid and semi-arid mining areas. Six retrieval models of soil moisture content based on machine learning algorithms (SVM, BPNN, PLSR, CNN, RF, RBFNN) are compared and analyzed. The accuracy is verified by measured sampling data, and the following conclusions are obtained:
Taking the red edge band as the horizontal axis and the near infrared band as the vertical axis, it is the optimal spatial band combination of spectral characteristics for constructing soil lines.
Based on the Redge-NIR spectral feature space and incorporating the vegetation coverage factor, the ReMPDI has the highest coefficient of determination (R2) of −0.798, indicating a significant correlation. The VIF is 3.114, which is less than 10, and the Tol is 0.321, which is greater than 0.1, indicating strong representativeness for surface soil moisture content.
Among the six inversion models of soil moisture content constructed by machine learning algorithms, the RF model has the highest coefficient of determination R2, and both RMSE and MAE are lower than the other five models. Compared with the measured results, the error is only 9.52%, making it the best soil moisture inversion model.
Based on the optimal soil moisture inversion model, soil moisture inversion was conducted for each sampling area in the study area. The surface soil moisture ranged from 1.71 to 12.13%, with an average of 5.07% and a standard deviation of 2.36%. This is consistent with the descriptive statistical results of the measured soil samples in the study area, and the soil moisture in the mining-disturbed area is lower than that in the undisturbed area, which is consistent with the actual situation. This proves that the soil moisture inversion model based on the modified perpendicular drought index extracted in this study is reliable.
This study has obtained a relatively reliable inversion model for surface soil moisture content in mining areas, which can provide theoretical basis and technical support for the fine monitoring of surface soil moisture content in large-scale arid and semi-arid mining areas, and has important practical significance. However, whether it applies to other research areas still needs to be further studied, and the amount of data collected on-site is not enough. In the future, more sampling needs to be conducted and more inversion models need to be compared to further study the long-term surface soil moisture content in mining areas.
Acknowledgements
Supported by the Basic Research Operations in Higher Education (N25XQD015).
Author contributions
Fan Zhang : Writing – original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Conceptualization, Funding acquisition. Yusheng Liang : Writing – review & editing, Supervision. Zhenqi Hu : Resources.
Data availability
The data utilized in this paper is derived from two sources. A portion of the data comes from the free data source of 91 weitu satellite (https://www.91weitu.com/) and the other part is collected through our own drone - based surveys. Those interested in accessing the drone - collected data may contact the corresponding author.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Zhang, F. et al. A new identification method for surface cracks from uav images based on machine learning in coal mining areas. Remote Sens.12, 20 (2020). [Google Scholar]
- 2.Wu, Z. Y., Cui, F. & Nie, J. L. Surface soil water content before and after coal mining and its influencing factors-a case study of the Daliuta coal mine in Shaanxi province, China. Mine Water Environ.41, 790–801 (2022). [Google Scholar]
- 3.Cui, L. G. et al. BBO-BPNN and AMPSO-BPNN for multiple-criteria Inventory Classification175 (Expert Systems with Applications, 2021).
- 4.Nie, Y., Tan, Y., Deng, Y. Q. & Yu, J. Suitability Evaluation of Typical Drought Index in Soil Moisture Retrieval and Monitoring Based on Optical Images12 (Remote Sensing, 2020).
- 5.Liu, C. Z. & Shi, J. C. Estimation of vegetation parameters of water cloud model for global soil moisture retrieval using time-series l-band Aquarius observations. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.9, 5621–5633 (2016). [Google Scholar]
- 6.Zhang, F., Hu, Z., Liang, Y., Fu, Y. & Yang, K. An optimal approach for crack extraction from uav sub-images after cutting. Int. J. Remote Sens., 43, 7 (2022).
- 7.Anagnostopoulos, V., Petropoulos, G. P., Ireland, G. & Carlson, T. N. A Modernized Version of a 1D Soil Vegetation Atmosphere Transfer Model for Improving its Future Use in Land Surface Interactions Studies90147–156 (Environmental Modelling & Software, 2017).
- 8.Ye, N., Walker, J. P., Gao, Y., PopStefanija, I. & Hills, J. Comparison between thermal-optical and l-band passive microwave soil moisture remote sensing at farm scales: towards uav-based near-surface soil moisture mapping. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.17, 633–642 (2024). [Google Scholar]
- 9.Pijl, A., Quarella, E., Vogel, T. A., D’Agostino, V. & Tarolli, P. Remote sensing vs. field-based monitoring of agricultural terrace degradation. Int. Soil. Water Conserv. Res.9, 1–10 (2021). [Google Scholar]
- 10.Wigmore, O., Mark, B., McKenzie, J., Baraer, M. & Lautz, L. Sub-metre mapping of surface soil moisture in proglacial valleys of the tropical Andes using a multispectral unmanned aerial vehicle. Remote Sens. Environ.222, 104–118 (2019). [Google Scholar]
- 11.Zhan, Z., Qin, Q., Ghulan, A. & Wang, D. NIR-red spectra space based new method for soil moisture monitoring. Sci. China Ser. D-Earth Sci.50, 283–289 (2007). [Google Scholar]
- 12.Ghulam, A., Qin, Q. M., Teyip, T. & Li, Z. L. Modified perpendicular drought index (MPDI): a real-time drought monitoring method. Isprs J. Photogrammetry Remote Sens.62, 150–164 (2007b). [Google Scholar]
- 13.Ghulam, A., Li, Z. L., Qin, Q. M. & Tong, Q. X. Exploration of the spectral space based on vegetation index and albedo for surface drought Estimation. J. Appl. Remote Sens., 1, 013529 (2007).
- 14.Danodia, A., Kushwaha, A. & Patel, N. R. Remote sensing-derived combined index for agricultural drought assessment of Rabi pulse crops in Bundelkhand region, India. Environ. Dev. Sustain.23, 15432–15449 (2021). [Google Scholar]
- 15.Zormand, S., Jafari, R. & Koupaei, S. S. Assessment of PDI, MPDI and TVDI drought indices derived from MODIS aqua/terra level 1B data in natural lands. Nat. Hazards. 86, 757–777 (2017). [Google Scholar]
- 16.Wang, S. N., Wang, W. J., Wu, Y. J. & Zhao, S. X. Surface Soil Moisture Inversion and Distribution Based on Spatio-Temporal Fusion of MODIS and Landsat14 (Sustainability, 2022).
- 17.Amani, M., Salehi, B., Mahdavi, S., Masjedi, A. & Dehnavi, S. Temperature-vegetation-soil moisture dryness index (TVMDI). Remote Sens. Environ.197, 1–14 (2017). [Google Scholar]
- 18.Zhao, Y. L. et al. Monitoring of soil moisture in coal mining subsidence with high ground-water level by remote sensing. Disaster Adv.6, 139–144 (2013). [Google Scholar]
- 19.Gao, Y. R., Lian, X. G. & Ge, L. L. Inversion Model of Surface Bare Soil Temperature and Water Content Based on UAV Thermal Infrared Remote Sensing125 (Infrared Physics & Technology, 2022).
- 20.He, L. et al. An improved method for soil moisture monitoring with ensemble learning methods over the Tibetan plateau. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.14, 2833–2844 (2021). [Google Scholar]
- 21.Bertalan, L. et al. UAV-based Multispectral and Thermal Cameras To Predict Soil Water content - A Machine Learning Approach200 (Computers and Electronics in Agriculture, 2022).
- 22.Adab, H., Morbidelli, R., Saltalippi, C., Moradian, M. & Ghalhari, G. A. F. Machine learning to estimate surface soil moisture from remote sensing data. Water, 12, 11 (2020).
- 23.Li, X. X., Zhu, C. G., Ze-Tian, F., Hai-Jun, Y. & Yao-Qi, P. Rapid detection of soil moisture content based on uav multispectral image. Spectrosc. Spectr. Anal.40, 1238–1242 (2020). & Z. Yong-Jun [Google Scholar]
- 24.Liang, Y. J., Ren, C., Wang, H. Y., Huang, Y. B. & Zheng, Z. T. Research on soil moisture inversion method based on GA-BP neural network model. Int. J. Remote Sens.40, 2087–2103 (2019). [Google Scholar]
- 25.Wang, S. A., Li, R. P., Wu, Y. J. & Wang, W. J. Estimation of surface soil moisture by combining a structural equation model and an artificial neural network (SEM-ANN). Science of the Total Environment, 876, 162558 (2023). [DOI] [PubMed]
- 26.Ebtehaj, A. & Bras, R. L. A physically constrained inversion for high-resolution passive microwave retrieval of soil moisture and vegetation water content in L-band. Remote Sens. Environ., 233, 111346 (2019).
- 27.Lv, W. T. et al. Multi-model comprehensive inversion of surface soil moisture from Landsat images based on machine learning algorithms. Sustainability, 16, 3509 (2024).
- 28.Esri, A. G. I. S. Pro, Vol. 3.4.0 (ed: Esri Inc., 2024).
- 29.He, T. T. et al. Identifying Coal Mining Subsidence Impacts by Soil Moisture Based on Optical Trapezoid Model in Google Earth Engine344990–5003 (Land Degradation & Development, 2023).
- 30.https://www.91weitu.com/. 91 Weitu, vol.19.4.0.
- 31.Li, X. et al. Effects of RapidEye imagery’s red-edge band and vegetation indices on land cover classification in an arid region. Chin. Geogra. Sci.27, 827–835 (2017). [Google Scholar]
- 32.Deng, J., Chen, X., Du, Z. & Zhang, Y. Soil water simulation and predication using stochastic models based on ls-svm for red soil region of China. Water Resour. Manage. 25, 2823–2836 (2011). [Google Scholar]
- 33.Cheng, K. & Lu, Z. Z. Active learning bayesian support vector regression model for global approximation. Inf. Sci.544, 549–563 (2021). [Google Scholar]
- 34.Wang, L., Zeng, Y. & Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl.42, 855–863 (2015). [Google Scholar]
- 35.Lin, L. X. & Liu, X. X. Soil-moisture-index spectrum reconstruction improves partial least squares regression of spectral analysis of soil organic carbon. Precision Agric.23, 1707–1719 (2022). [Google Scholar]
- 36.Khaledian, Y., Kiani, F., Ebrahimi, S., Brevik, E. C. & Aitkenhead-Peterson, J. Assessment and monitoring of soil degradation during land use change using multivariate analysis. Land. Degrad. Dev.28, 128–141 (2017). [Google Scholar]
- 37.Petersen, P. C. & Sepliarskaia, A. VC dimensions of group convolutional neural networks. Neural Netw.169, 462–474 (2024). [DOI] [PubMed] [Google Scholar]
- 38.Zhao, W. D., Wu, Z. L., Yin, Z. D. & Li, D. S. Reducing moisture effects on soil organic carbon content Estimation in vis-nir spectra with a deep learning algorithm. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.16, 7733–7748 (2023). [Google Scholar]
- 39.Breiman, L. Random forests. Mach. Learn.45, 28 (2001). [Google Scholar]
- 40.Pahlavan-Rad, M. R. et al. Prediction of Soil Water Infiltration Using Multiple Linear Regression and Random Forest in a Dry Flood Plain, Eastern Iran194 (Catena, 2020).
- 41.Mustafa, M. R., Rezaur, R. B., Rahardjo, H. & Isa, M. H. Prediction of pore-water pressure using radial basis function neural network. Eng. Geol.135, 40–47 (2012). [Google Scholar]
- 42.Sandberg, I. W. Gaussian radial basis functions and the approximation of input-output maps. Int. J. Circuit Theory Appl.31, 443–452 (2003). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data utilized in this paper is derived from two sources. A portion of the data comes from the free data source of 91 weitu satellite (https://www.91weitu.com/) and the other part is collected through our own drone - based surveys. Those interested in accessing the drone - collected data may contact the corresponding author.





















