Abstract
Spatial variability and uncertainty associated with soil volumetric moisture content (SVMC) is crucial in moisture prediction accuracy, this paper sets out to address this point of SVMC by developing data-driven model. Grid samples of SVMC covered approximately a 3-ha field during the jointing growth stage of winter wheat, and SVMC were measured by Time Domain Reflectometry (TDR), located in North China Plain, China. Bayesian inference was performed to explore spatial heterogeneity, robustness, transparency, interpretability and uncertainty related to SVMC using python-based PyMC3 combined with Integrated Nested Laplace Approximation with the Stochastic Partial Differential Equation (INLA-SPDE) model. The results showed that the prediction surface of SVMC, the lower and upper limits of 95% credible intervals quantified uncertainty associated with SVMC, cauchy prior of the flexibility and adaptability to obtain state-of-the-art predictive performance is more robust than gaussian prior for SVMC prediction, the transparency and interpretability of SVMC prediction model were revealed by MCMC (Markov-Chain Monte-Carlo) trace plots, KDE (Kernel density estimates), and rank plots. The uncertainty associated with SVMC can explicitly be described using the highest-posterior density interval, the prediction lower and upper limits.
Keywords: Soil moisture content, INLA-SPDE model, Uncertainty, Transparency and interpretability, Cauchy prior
Subject terms: Ecology, Environmental sciences
Introduction
Soil moisture which links soil process, especially surface moisture top layer (0–20 cm depth) plays a fundamental and important role in the land–atmosphere exchange proces1–3. The better spatial estimation of surface soil moisture can help to improve climate prediction accuracy and support water resource decision-makers4,5. Improving soil moisture across different scales of resolution to increase water-use efficiency and response to the environmental stress are one of the critical challenges released by The National Academies of Sciences, Engineering and Medicine (NASEM)6.
Geostatistics has widely been applied for digital soil mapping7. Kriging-based techniques have become important tools of soil properties variability8. Geostatistics, including kriging, co-kriging, regression kriging, probability and universal kriging quantify the uncertainties of estimation with reduction of investigation costs3. However, the estimation accuracy is usually limited by the density and distribution of sample sites, and this approach also causes “smoothing effects”3,9,10. As spatio-temporal geostatistics, bayesian maximum entropy (BME) which is used to estimate soil texture and soil textural fractions, has been shown to be more accurate than geostatistics11,12. Using soft information based-BME has the advantages on a sound theoretical basis3. However, construction method of limited soft information and prediction high-efficiency face with challenge at present13. Additionally, sequential gaussian simulation and quantile regression forest produced the better accuracy uncertainty models to quantify the spatial uncertainty of soil organic carbon stock in Hungary14. Apart from geostatistical approach, bayesian inference is also used to assess spatial uncertainty for digital soil mapping, this approach has tended to rely on flexible MCMC simulation, but computationally and time-intensive simulations15. A computationally efficient alternative to MCMC was developed for so-called the INLA-SPDE model16. As an emerging model shown potential in digital soil mapping, this model was utilized to map soil organic matter with the advantages of good assessment of uncertainty14. INLA-SPDE was adopted for modeling spatio–temporal evolution of soil organic matter on the regional scale17. The robustness of mapping soil pH was demonstrated with sparse datasets on farm scale utilizing INLA-SPDE model18. INLA-SPDE model allows to construct soil moisture maps in a more sensible and efficient way19, and it not only provides a flexible and robust approach taking into account the spatio-temporal correlation with uncertainty description20, but also is a computationally effective approach for latent gaussian model including a wide and flexible models ranging from (generalized) linear mixed to spatial and spatio-temporal models models21,22. Thus, spatial variability and uncertainty associated with SVMC were chosen to develop using INLA-SPDE model in the study.
Bayesian probabilistic programming is an alternative stochastic simulation technique assessing the interpretability, uncertainty and interactions with high posterior density interval23. A growing body of literatures explore bayesian probabilistic programming, discusses of the state-of-the-art advances in machine learning and artificial intelligence field, and this approach is conducive to address the robustness, transparency, and interpretability of model24–29. Theano, as PyMC3 core component, its application in Bayesian probabilistic programming, which is based on deep learning principles, offers unique advantages which include the model flexibility, transparency, and interpretability of the results derived by integrating prior and posterior probabilities from a probabilistic perspective30,31. Additionally, MCMC trace-plots, KDE and rank plots from python can effectively reveal the dynamic change of parameters in transparency and interpretability manner, different prior distributions based on python package Theano reflect the robustness of data prediction, and highest-posterior density interval can explicitly describe the uncertainty associated with model variables31. Therefore, the robustness, transparency, interpretability associated with SVMC prediction model can be addressed by PyMC3 bayesian probabilistic programming.
Traditionally, TDR and gravimetric measurements are still accurate at the point scale, soil moisture also interacts with the groundwater table depth and its variations32. Soil moisture during winter wheat growth process reflects soil processes, and has a large influence on winter wheat growth, the jointing stage is more important than other growth stages and needs water to promote wheat growth, SVMC in topsoil is a certain indicator for the correct volume of water for irrigation during the winter wheat jointing stage10. Thus, soil moisture top layer (0–20 cm depth) is critical to moisture precision management. Evidently, the spatial variability, transparency, interpretability and uncertainty associated with SVMC is important to improve the accuracy of SVMC data-driven model and smart irrigation.
As summarized above, we develop the spatial variability and uncertainty associated with SVMC during winter wheat jointing growth stage. The paper aimed to analyze spatial variability and quantify spatial uncertainty related to SVMC using INLA-SPDE model. Furthermore, the study set out to explore the robustness, transparency and interpretability related to SVMC based on PyMC3 probability programming.
Materials and methods
Study area and TDR field sampling
The study area (117°04.130’E, 36°42.979’N), covering approximately three hectares, is located in the north side of the Xiaoqinghe River in Shandong province, China, the important area in North China Plain. The north-south distance between two adjacent soil sampling points is about 10 m, and the east-west distance is 8 m, and grid sampling was used to measure SVMC. The case region is in the half moist monsoon climate region and the soil is sandy clay loam. The wheat-maize rotation was conducted yearly in the crop fields in the case study.
In situ measurements of soil moisture are invaluable provided information that facilitates the study of the spatial variability of soil moisture at different scales4,33. Several techniques are available for measuring soil moisture content in situ, such as gravimetric method, neutron probes, cosmic-ray neutrons, electromagnetic techniques. However, TDR probes using electromagnetic techniques are non-destructive and can be easily set up for automated operation with a data logger. A number of sites are established to sample in the field, a TDR reading is taken followed by the extraction of a known volume of soil at each site, and the wet weight of this soil must be determined. The volumetric water content is calculated as follows:
![]() |
1 |
where,
and
denotes mass (g) of wet and dry soil respectively,
represents the total soil volume (ml),
means density of water (1 g/ml).
In our experiment, SVMC was the ratio of the volume of moisture in a given volume of soil to the total soil volume, expressed as either a decimal or a percent. At each sampling point, five SVMC measurements were made within a 1-m diameter circle. The SVMC for each sample point was displayed on an LCD (liquid crystal display) screen of TDR10. We chose to save the SVMC data using the Excel formation file to further prepare for SVMC analysis. Each SVMC measurement was geo-referenced using a Differential Global Positioning System (DGPS). 231 SVMC samples were collected from the surface layer (0–20 cm) in the jointing stage during winter wheat growth in 2017.
INLA-SPDE model
Basic principles of INLA-SPDE model
Gaussian Random field (GRF) which is a collection of random variables where the observations occur in a continuous domain is widely applicable in geostatistics, ecology, epidemiology, as well as environmental risk assessment21,22. Matérn covariance functions are the most common type in geostatistics models34, the covariance matrix of GRF is constructed from the following Matérn covariance function which is given as:
![]() |
2 |
Here,
denotes the Euclidean distance between the point
and point
,
denotes the marginal variance of the spatial field,
is Gamma function,
is the modified Bessel function,
which
is related to the range is the scale parameter, and the mean square differentiability of the process was determined by the smoothness parameter
[20,22,35. A GRF with a Matérn covariance matrix mainly depends on the scale parameter
and the smoothness parameter
.
Let
denote the observations of a spatial variable
at locations
,…,
, n = 1,…,231. SVMC samples are located in the set locations
,…,
of 231 sites. Here,
denotes a Matérn field36. Where the spatial domain
is a fixed subset of
and the spatial index
varies continuously throughout
20,22,37. GRF models with Matérn covariance functions can be expressed as solutions to the following SPDE
![]() |
3 |
Here,
which is jointly given by the parameters (max.edge, cutoff, and offset) of the INLA-SPDE model is the spatial scale parameter, and associated with the range in geostatistics20,22, to avoid the boundary effect and sharp corners, the parameter offset is used to extend the domain of interest by a distance20,21,
is the Laplacian,
which is a positive integer related to parameter
controls the smoothness (
) related to the SPDE, the variance is controlled by parameter τ, Z(s) is a GRF and
is a gaussian spatial white noise process. The marginal variance
of Matérn covariance function is related to the SPDE through
![]() |
4 |
From this we can identify the exponential covariance of Matérn function with
.
![]() |
5 |
Here
is the number of the triangulation vertices,
represents the basis functions to provide the link between the GRF and GMRF making it easier as implemented in the R-INLA package20–23. Specifically, the finite element method which leads to a triangulated mesh with
nodes and
basis functions was used to obtain an approximate solution of the SPDE. Basis functions
on each triangle that is equal to 1 at vertex
, and equal to 0 at the other vertices. Then, the GRF
is represented as a GMRF by the basis functions given on the triangulated mesh. The joint distribution of the weight vector is assigned a gaussian distribution
that approximates the solution
of the SPDE in the mesh nodes, and the approximation
from the mesh nodes was transformed into the other spatial locations by the basis functions. Here, the appropriate precision matrix
for the weights is given by sparse matrices yielded piece-wise linear basis functions defined by a triangulation of the domain of interest, whether two-dimensional domains, or one-dimensional domains20–22.
Key points using the INLA-SPDE model
SVMC occurs continuously in space. As a spatially continuous variable, SVMC can be modeled using a GRF, we can use SPDE implemented in the R-INLA package to fit a spatial model and predict the SVMC at unsampled locations.
Triangulated mesh construction
A triangulated mesh was created by the finite element method to obtain an approximate solution of the SPDE by the inla.mesh.2d() function of the R-INLA package21,22. Here, parameters of this function, such as offset, max.edge and cutoff needed to be set. In our study, we took into account the computational cost and modeling accuracy, offset = c(-0.15, 70) is specified to have an outer extension of size 70 around the locations, -0.15 denotes the coverage diameter of the data range will be increased by 15% as the mesh extension, which also avoids the boundary effects and sharp corners22. cutoff = 1 was set to avoid building many small triangles, we set max.edge = c(7, 20) to use small triangles within the region, and larger triangles in the extension22. Once the mesh was constructed, Matérn function also defined the spatial correlation structure of the SPDE15,17.
INLA-SPDE model construction
The INLA-SPDE model was used to predict SVMC, components of prediction model can be expressed as follows:
![]() |
6 |
![]() |
7 |
Here,
means a zero-mean gaussian distribution with mean
and variance
, it is a spatially gaussian random effect which captures spatial variability of SVMC. while the mean
is expressed as the intercept
and
which is a spatially structured random effect with Matérn covariance function. This step, the parameter
(
was set by inla.spde2.matern() function of the R-INLA package to build the SPDE model on the mesh.
is associated with the smoothness parameter of the process16. In our study, we set the smoothness parameter
equal to 1 and
, thus
.
Space mapping and plotting of INLA-SPDE model
The index set for the SPDE model was generated utilizing the function inla.spde.make.index() from R- INLA package, and a projection matrix using inla.spde.make.A() function passing the triangulated mesh and the coordinates was constructed to project the GRF from the observations to the triangulation vertices20,22. Non-informative prior distribution of model and the default parameters and hyperparameters were selected and adopted in model. A matrix with the coordinates of the locations was constructed where we will predict the SVMC, mainly constructing a grid with 50 × 50 locations by using expand.grid() and combining vectors which contain coordinates in the range of the study border, inla.stack() function was used to organize the data and projection matrices. Moreover, the INLA-SPDE model formula to perform bayesian inference is specified by including the fixed and random effects. Finally, the SVMC pred_mean with the posterior mean and SVMC pred_ll and pred_ul with the lower and upper limits of 95% credible intervals were created in the study to quantify the uncertainty associated with SVMC, respectively.
PyMC3 bayesian probabilistic programming
PyMC3 probabilistic programming is flexible platform for building complex statistical models using custom likelihood functions. Theano, which encapsulates the gradient calculations and automatic differentiation required for the NUTS algorithm, was used for Bayesian probabilistic programming as a core component of PyMC331,38,40. The NUTS algorithm automates the selection of an appropriate path length overcoming HMC sensitivity to parameters such as the size and required number of steps31,38, it uses a smart recursive simulation algorithm to identify potential candidate points heuristically39. Therefore, the NUTS algorithm samples from models with continuous parameters more efficiently and quickly than traditional methods by leveraging log posterior-density gradient information40,41. In the present study, we used the NUTS algorithm for sampling from posterior distributions.
In this work, the Bayesian inference was developed using the PyMC3 package (Theano) on the Python platform. The NUTS algorithm explores the target distribution more efficiently and achieves faster convergence. Both cauchy prior and gaussian prior is used to analyze the robustness associated with SVMC.
The NUTS algorithm Bayesian inference was performed using data from 231 SVMC sampling points in the case region. For the posterior analysis process, PyMC3 provides plotting and summarization functions for inspecting the sampling output, as well as a simple posterior plot that can be created using a trace-plot. KDE and traceplot of mu and sigma were obtained based on the generated NUTS samples after 1100 iterations in this study. The Gaussian prior distribution is expressed as follows:
![]() |
8 |
![]() |
9 |
![]() |
10 |
where, the parameter mu is derived from a uniform distribution with upper and lower bounds a and b, and the parameter sigma is derived from a half-normal distribution with a standard deviation
.
follows a Gaussian distribution with parameters mu and sigma. According to our previous knowledge, we set
and
, and our option is to set
42,43.
The histogram as shown in Fig. 1(a) presented heavy-tail distribution of SVMC, heavy tail means the outliers that deviate from the mean, the cauchy prior distribution which has heavy tail characteristics is more effective because the distribution is not clustered near the mean like the gaussian distribution44. Thus, the cauchy prior was used to replace the gaussian prior, correspondingly, we had rewritten the Cauchy prior model as follows:
![]() |
11 |
![]() |
12 |
![]() |
13 |
![]() |
14 |
Fig. 1.
(a) The histogram and kernel density curve of SVMC. Here, the green solid line is kernel density line of SVMC. (b) Q-Q plot of SVMC from 231 sample points. Here, the blue sample points denote observed data of SVMC, the red solid line is theoretical normal line.
Cauchy prior has one more parameter
of the prior than the gaussian prior, here,
was set as an exponential distribution with a mean of 2043, actually, the exponential prior is a weakly informative prior indicating the
should be around 20.
Evaluation and validation of SVMC
The deviance information criterion (DIC) which is a commonly-used index for measuring model performance in INLA-SPDE model is based on a trade-off between the fit of the data to the model with smaller values of DIC indicating a better mode22. A smaller DIC indicates a better model fit15,17. Moreover, the Condition Predictive Ordinate (CPO) and probability integral transform(PIT) which was used as effective index to evaluate the predictions was also calculated in our study22. In order to assess the validity of the estimated INLA-SPDE model of SVMC, we also perform a simple residual analysis by calculating Root Mean Squared Error (RMSE) between observations and predictions of SVMC corresponding to the 69 validation sites.
![]() |
15 |
Here
is the sample size of validation sites,
and
is predicted and observed SVMC value at the corresponding validation sites.
Results and analysis
Statistics characteristics of SVMC
Descriptive statistics(such as variance, coefficient of variation) of SVMC based on 231 samplings was analyzed and calculated in the R(Version4.3.0) packages“pastecs” during the jointing stage of winter wheat in 2017. The variance of SVMC (Percent volumetric moisture content) is 11.533% and the standard deviation is 3.396%. SVMC ranged from 12.6 to 34.8% with a mean (20.25%). The coefficient of variation (CV) of SVMC was 0.168, which indicated the medium variability of SVMC. The asymmetry of SVMC is measured by the skewness (the skewness is 1.0) which is the departure from normality, while the peakedness of a distribution is expressed by kurtosis whose significance relates mainly to the normal distribution (the kurtosis is 2.31 in the study), as shown in Fig. 1(a). QQ-plot and the histograms are used to explore if the data is normally distributed (a bell-shaped curve) in the study. As illustrated in Fig. 1(b), we can see that the plot which is close to a straight line showed the approximate normality though the main departure from this line occurs at high values of SVMC. It should be noted that the histogram as shown in Fig. 1(a) presented heavy-tail distribution of SVMC which provided the foundation for more reasonable prior distribution in the following study.
Spatial uncertainty associated with SVMC based on INLA-SPDE model
The triangulated mesh constructed is shown in Fig. 2, and the number of vertices of the triangulated mesh is 2330 based on the mesh using a boundary of the region of study (the green boundary line).
Fig. 2.
Triangulated mesh to build the SPDE model (the points of 231 denote the sampling sites including the red 162 training samples and the blue 69 validation samples(training sample and validation sample are randomly divided into two parts according to 7:3 ratio covering 231 sample sites ), and the green line denotes the boundary of the case study). Unit of both horizontal scale and vertical coordinate distance is meter. The Figure was generated by “Triangulated mesh construction” section using open-source R4.3.0 (https://www.r-project.org/) combined with R-INLA_23.06.12 (www.r-inla.org).
Finally, sequential gradient between the color green (low) and red (high) were correspondingly created in the study. The three maps were created in the same plot with one unit on the x-axis and on the y-axis of the map, these maps included the maps of SVMC prediction mean, the lower limits of 95% credible intervals map of the predictions and the upper limits of 95% credible intervals map derived using the INLA-SPDE model, as illustrated in Fig. 3. INLA-SPDE results exhibited a consistent spatial distribution of SVMC, the predicted SVMC is uneven. The posterior mean of SVMC is 20.253%, the standard deviation is 0.216%, the 2.5% percentiles (the value is 19.828%) and 97.5% percentiles (the value is 20.677%), corresponding to the lower limit and upper limit of the SVMC prediction.
Fig. 3.
The SVMC posterior predictions, the lower and upper limits of 95% credible intervals. The pred_mean with the posterior mean of SVMC and pred_ll of SVMC and pred_ul of SVMC with the lower and upper limits of 95% credible intervals, respectively. Unit of horizontal scale and vertical coordinate distance is meter. The Figure was created by open-source R4.3.0 (https://www.r-project.org/) combined with R-INLA_23.06.12 (www.r-inla.org).
We continuously adjust the parameter values in the model prediction to improve the accuracy, run the model multiple times, and compare the results of the model. Finally, the mean of CPO is 0.719, the mean of PIT is 0.498, DIC is equal to -1106.054, and RMSE is equal to 1.705 between observed and predicted SVMC. Figure 4 showed the scatter plots and regression line between observed and predicted SVMC based on 69 validation sites (69 red points), The regression equation of predicted and observed SVMC is y = 2.881 + 0.829x (y denotes predicted SVMC, x represents observed SVMC, R-squared equal to 0.701, p-value less than 0.01), which indicated the good performance of INLA-SPDE model of SVMC prediction.
Fig. 4.
Regression line(the green line) and scatter plots of observed and predicted SVMC based on 69 validation sites from INLA-SPDE model. Unit of vertical coordinate and horizontal scale is %.
Transparency, interpretability and uncertainty related to SVMC
Transparency, interpretability related to SVMC
As described above, the lower limit and upper limit map of the SVMC prediction present the spatial distribution and quantify uncertainty derived using the INLA-SPDE model. In this section, we mainly use cauchy prior and gaussian prior to explore the transparency, interpretability, robustness and uncertainty associated with SVMC.
The marginal distributions of parameter
and
was generated using PyMC3. As shown in Fig. 5(a), On the left panel, KDE of two Markov Chains was calculated by PyMC3. Apparently, there is the difference in the KDE and trace plots between each of the chains, each plotted line (solid line and dashed line) represents a single independent chain in parallel. The KDE and trace plots of
belong to the same distribution between both chains, these characteristics showed MCMC methods convergence. Similarly, the KDE and trace plots of
belong to the same distribution as there are only small (random) differences between both chains, which indicated both chains are good mixing of
and
42,43 Trace plots which seems to be similar to the one from good chains are made at each iteration from both chains on the right panel from Fig. 5(a).
Fig. 5.
(a) Kernel density estimates (KDE) and simulated trace for mu and sigma in the Gaussian model. (b) Parameter mu/sigma: rank plots compare the height of the bar to the dashed line representing a uniform distribution.
Evaluating MCMC convergence is important and necessary to check whether the gaussian prior makes sense, rank plots are used in convergence diagnosis combining with the effective sample size (ESS), potential scale reduction factor
, monte carlo standard error (MCSE)44–46.The ranks are very close to uniform and that both chains look similar to each other with not distinctive patterns in Fig. 5(b), which shows good mixing of both chains and makes sense of the gaussian prior47.
Figure 6(a) shows what the bi-dimensional posterior looks like and the marginal distributions of the parameters
,
, and
from cauchy prior.
Fig. 6.
(a) Kernel density estimates (KDE) and simulated trace for mu, sigma and nu in the cauchy prior. (b) Rank plots of the parameters mu/sigma/nu: compare the height of the bar to the dashed line representing a uniform distribution.
Similarly, rank plots (Fig. 6(b)) are used to evaluate MCMC methods convergence.
ESS_bulk mainly assesses how well the center of the gaussian and cauchy prior was resolved in the study, while ESS_tail, corresponds to the minimum ESS which is close to the actual number of samples at the percentiles 5 and 95.
(
⪅1.01) is considered safe and reasonable samples42. As a result, the summary was compared from gaussian model prior and cauchy prior, including the mean, standard deviation (sd), and 94% HDI interval (HDI 3% and HDI 97%), ESS,
and MCSE, as shown in Tables 1 and 2.
Table 1.
Posterior summary of parameters from gaussian prior.
| Mean | SD | hdi_3 | hdi_97 | msce_sd | ESS_bulk | ESS_tail | R_hat | |
|---|---|---|---|---|---|---|---|---|
| mu | 20.26 | 0.225 | 19.86 | 20.71 | 0.002 | 3819 | 2915 | 1.0 |
| sigma | 3.41 | 0.16 | 3.12 | 3.71 | 0.002 | 3883 | 3136 | 1.0 |
Table 2.
Posterior summary of parameters from cauchy prior.
| Mean | SD | hdi_3 | hdi_97 | msce_sd | Ess_bulk | Ess_tail | R_hat | |
|---|---|---|---|---|---|---|---|---|
| mu | 20.03 | 0.20 | 19.66 | 20.41 | 0.002 | 2421 | 2354 | 1.0 |
| sigma | 2.71 | 0.22 | 2.31 | 3.11 | 0.004 | 2083 | 2136 | 1.0 |
| nu | 6.37 | 3.01 | 2.63 | 11.18 | 0.065 | 1914 | 2089 | 1.0 |
Compare posterior summary from gaussian prior (Table 1) with the trace of cauchy prior (Table 2), the estimation of mu between both models is similar, with a difference of ≈ 0.2. The estimation of sigma changes from ≈ 3.41 to ≈ 2.71, this is mainly because the cauchy prior gives less weight by values away from the mean42. nu ≈ 6 indicated a very cauchy-like distribution with heavy tails of SVMC. MCMC trace plots, KDE and rank plots reflect the transparency and interpretability of SVMC prediction model with numbers and plots.
Robustness and uncertainty associated with SVMC
Moreover, we will perform the posterior predictive check of the cauchy prior and gaussian prior. 100 predictions from the posterior were generated to check and simulate SVMC how consistent the simulated value with the measured value. As shown in Fig. 7(a), the blue solid line which is a KDE of SVMC measurement represents the measured data and the semitransparent (red) lines which reflect the uncertainty of the 100 predictions from gaussian model. The mean value of the sampling value is slightly to the right, and the change of sampling value is larger than the original SVMC measurement value, which is a direct consequence of some measurements that are separated from the bulk of the data. Though gaussian prior is a reasonable and useful representation of SVMC. Nevertheless, the model does not correctly handle the heavy-tailed distribution, so we explored how to get predictions that match the data even closer. As shown in Fig. 7(b), cauchy prior can better fit the SVMC in terms of the peak and shape of the distribution, the predicted values look very close to measured data, in particular in tails. Posterior predictive checks of SVMC bayesian inference confirmed the cauchy prior had the better robustness, and higher prediction accuracy because the outliers reduce the value of normal parameters, the mean is more estimated from the measured center data43,44.
Fig. 7.

(a) Uncertainty and posterior prediction of gaussian prior. As shown in Figure, the blue solid line represents the observed data and the semitransparent (red) ones predictions from an gaussian prior. (b) Uncertainty and posterior predictive checks of cauchy prior. As shown in Figure, the blue solid line represents the observed data and the semitransparent (red) ones predictions from cauchy prior. (c) The reported HDI corresponding from cauchy prior.
The uncertainty associated with SVMC can explicitly be described using the highest-posterior density interval (HDI). The posterior distribution is represented using a KDE, the mean and the limits of the HDI 94% are represented in the Fig. 7(c)48–51. Here, the 94% HDI as a black line at the bottom of the plot, it can help us make a decision depending on the posterior results of cauchy prior. A vertical (orange) line and the proportion of the posterior above and below our reference value present the uncertainty associated with SVMC, if observed data of SVMC is equal to 20.1, the value is a vertical (orange) line, 66% of the posterior is below the value, only 34% of the posterior is above the value, which reflects the uncertainty associated with SVMC using probability mode.
Discussion
Bayesian inference is known as probabilistic description which is built using probabilities, using probability to model uncertainty of SVMC is a reasonable methodological approach. Generally, we can describe the process of SVMC bayesian inference in 3 steps: (1) Based on SVMC data, a model of SVMC is designed mainly by combining and transforming way.(2)According to Bayes’ theorem to condition on the SVMC, a posterior which refers to the probability distribution of the parameters in the model rather than a value was calculated based on prior distribution and likelihood function.(3)Checking and diagnosing convergence of model according to different parameters and criteria42,44,48.
The posterior which is a balance between the prior and the likelihood has more flexibility, adaptability and simplicity and can provide posterior mean of the variables, It is clear that bayesian inference can be influenced by priors, whether non-informative priors (also known as flat, vague priors) or weakly-informative priors have the least possible amount of impact on the bayesian inference, weakly-informative priors is a better selection following the recommendations of Gelman, McElreath, Kruschke42,44,49. For the bayesian inference of SVMC, gaussian prior distribution or cauchy prior distribution has different influence on the posterior distribution of SVMC. This uncertainty related to SVMC is a robust model of SVMC observation using bayesian inference (including cauchy prior and gaussian prior) during the winter wheat jointing growth stage. The simulation blue solid line from the cauchy prior model (as shown in Fig. 7(b)) which represents the observed data and the semitransparent (red) ones predictions is higher accuracy than gaussian prior model, the cauchy prior model performs well for MCMC simulation of SVMC, cauchy prior of SVMC predictions is more robust than that of gaussian model, the bayesian-based inference uncertainty and the highest-posterior density interval of SVMC can explicitly be revealed and described in the study, which benefits the intelligent decision-making of smart agriculture. As is known to us, the result of bayesian inference is a posterior distribution which contains all of parameters. Thus, by summarizing the posterior, we are summarizing the logical consequences of a model and data, so the prior and posterior for model parameter is an important issue, an alternative is to use the more powerful and flexible model, the Dirichlet process mixture models how to add flexibility to models by mixing simpler distributions to build more complex ones, to deeply perform the simulation and prediction for object variables. A limitation of our study is that the model used in our analysis only incorporates the cauchy prior based on heavy-tail data of SVMC, not considering of Dirichlet process mixture models for SVMC posterior prediction.
It is important for bayesian inference which is emerging as a powerful framework to express and understand next-generation deep neural networks48. Quantifying uncertainty of soil properties and soil moisture offers unique opportunities through big data analysis and machine learning approaches14,50. The hierarchical model, or how to solve the problem structurally to better perform bayesian inference, and by partially “pooling” information of different groups, following shrinkage estimation which different groups share part of the data through hyper-prior is conducive to more stable inference44. Another limitation of the study is short of shringkage, over-fitting and under-fitting of SVMC based-bayesian inference process.
Currently, the robustness, transparency, interpretability and uncertainty of model have important developments at the cutting-edge, machine learning algorithms, deep learning algorithms and variational autoencoders which are hierarchical probabilistic models explain data at multiple levels, and thereby accelerate learning51–53. INLA-SPDE model covers a class of models ranging from (generalized) linear mixed to spatial and spatio-temporal models17,20,22. Using RS-based indices covariates may have the potential to serve as effective predictors of SVMC. However, we didn’t develop and perform the spatial predictions of SVMC combined with RS-based indices covariates based on machine learning algorithm and INLA-SPDE model.
Conclusions
Open source-based bayesian inference was performed to explore spatial heterogeneity, transparency, interpretability and uncertainty associated with SVMC using python-based PyMC3 combined with INLA-SPDE model. The conclusions are as follows:
Spatial variability and uncertainty associated with SVMC based on INLA-SPDE model during the jointing growth stage of winter wheat.
We create the maps of SVMC prediction mean, the lower limits of 95% credible intervals of the predictions and the upper limits of 95% credible intervals derived using the INLA-SPDE model, these maps exhibit a consistent spatial pattern of SVMC and describe the uneven characteristics, and maps of the lower limits of 95% credible intervals of the predictions and the upper limits of 95% credible intervals quantify the uncertainty associated with SVMC.
-
2.
Robustness, transparency, interpretability and uncertainty associated with SVMC based on PyMC3 probability programming prediction during the jointing growth stage of winter wheat.
This paper makes use of bayesian inference to give the flexibility and adaptability to obtain state-of-the-art predictive performance of SVMC. Maps of 95% credible intervals quantify the uncertainty associated with SVMC based on INLA-SPDE model. Cauchy prior of SVMC predictions is more robust than that of gaussian prior. The transparency and interpretability of SVMC prediction model were revealed by MCMC trace plots, KDE and rank plots. The based-bayesian inference uncertainty associated with SVMC can explicitly be revealed and described using the highest-posterior density interval.
Acknowledgements
We appreciate colleagues for their help and support from School of Civil Engineering and Geomatics of Shandong University of Technology.
Author contributions
Writing—Original Draft Preparation, Investigation, Data Curation, and Analysis, Y.Y.; Writing—Review and Editing, Investigation, Data Curation, and Analysis, X.T.
Funding
This research was funded by t the project“Preliminary application of smart agriculture based on 3S technology” (project code: 4041/421024) from Shandong University of Technology, 2021.
Data availability
Data availability statementAll data included in this study are available upon request by contact with the corresponding author.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xueqin Tong: Co-first author
References
- 1.Jacob, S. et al. State-of-the-art global models underestimate impacts from climate extremes. Nat. Commun.10, 1005 (2019). https://www.nature.com/articles/s41467-019-08745-6 [DOI] [PMC free article] [PubMed]
- 2.Eric, H. NASA’s new soil moisture satellite could improve forecasts. Science. 10.1126/science.aaa6407 (2015). [Google Scholar]
- 3.Gao, S. G. et al. Estimating the spatial distribution of soil moisture based on bayesian maximum entropy method with auxiliary data from remote sensing. Int. J. Appl. Earth Obs. 32, 54–66. 10.1016/j.jag.2014.03.003 (2014). [Google Scholar]
- 4.Dorigo, W. A. et al. The International Soil Moisture Network: A data hosting facility for global in situ soil moisture measurements. Hydrol. Earth Syst. Sc. 15(5), 1675–1698 (2011). [Google Scholar]
- 5.Dirmeyer, P. A. et al. GSWP-2: Multimodel analysis and implications for our perception of the land surface. B Am. Met. Soc.87, 1381–1397. 10.1175/BAMS-87-10-1381 (2006). [Google Scholar]
- 6.National Academies of Sciences, Engineering, and Medicine. Science Breakthroughs to Advance Food and Agricultural Research by 2030 (National Academies, 2019). 10.17226/25059
- 7.Martínez, M. J. F., Hueso, G. P. & Ruiz, S. J. D. Topsoil moisture mapping using geostatistical techniques under different Mediterranean climatic conditions. Sci. Total Environ.595, 400–412. 10.1016/j.scitotenv.2017.03.291 (2017). [DOI] [PubMed] [Google Scholar]
- 8.Goovaerts, P. Geostatistical modelling of uncertainty in soil science. Geoderma. 103(1–2), 3–26. 10.1016/S0016-7061(01)00067-2 (2001). [Google Scholar]
- 9.Diggle, P. J. & Paulo, J. R. Model-Based Geostatistics (1st ed.) 12–57. Springer Series in Statistics (2007).
- 10.Yujian, Y., Yanbo, H., Yong, Z. & Xueqin, T. Optimal Irrigation Mode and Spatio-temporal variability characteristics of Soil Moisture Content in different growth stages of winter wheat. Water. 10(9), 1180 (2018). [Google Scholar]
- 11.Douaik, A., Meirvenne, M. V. & Tóth, T. Soil salinity mapping using spatio-temporal kriging and bayesian maximum entropy with interval soft data. Geoderma. 128(3–4), 234–248. 10.1016/j.geoderma.2005.04.006 (2005). [Google Scholar]
- 12.Christakos, G., Serre, M. L. & Kovitz, J. L.BME representation of particulate matter distributions in the state of California on the basis of uncertain measurements. J. Geophys. Res. Atmos.106(9), 9717–9731. 10.1029/2000JD900780 (2001). [Google Scholar]
- 13.Chutian, Z. & Yong, Y. Can the spatial prediction of soil organic matter be improved by incorporating multiple regression confidence intervals as soft data into BME method? Catena. 178, 322–334 (2019).
- 14.Gábor, S. & László, P. Comparison of various uncertainty modelling approaches based on geostatistics and machine learning algorithms. Geoderma. 337, 1329–1340. 10.1016/j.geoderma.2018.09.008 (2019). [Google Scholar]
- 15.Poggio, L., Gimona, A., Spezia, L. & Brewer, M. J. Bayesian spatial modelling of soil properties and their uncertainty: The example of soil organic matter in Scotland using R-INLA. Geoderma. 277, 69–82. 10.1016/j.geoderma.2016.04.026 (2016). [Google Scholar]
- 16.Rue, H., Martino, S. & Chopin, N. Approximate bayesian inference for latent gaussian models using integrated nested Laplace approximations. J. R Stat. Soc. B. 71, 319–392 (2009). [Google Scholar]
- 17.Chenconghai, Y., Lin, Y., Lei, Z. & Chenghu, Z. Soil organic matter mapping using INLA-SPDE with remote sensing based soil moisture indices and Fourier transforms decomposed variables. Geoderma. 437, 116571. 10.1016/j.geoderma.2023.116571 (2023). [Google Scholar]
- 18.Huang, J., Malone, B. P., Minasny, B., McBratney, A. B. & Triantafilis, J. Evaluating a bayesian modelling approach (INLA-SPDE) for environmental mapping. Sci. Total Environ.609, 621–632. 10.1016/j.scitotenv.2017.07.201 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Carbó, E. et al. Modeling influence of Soil properties in different gradients of Soil moisture: the case of the Valencia Anchor Station Validation Site. Spain Remote Sens.13, 5155. 10.3390/rs13245155 (2021). [Google Scholar]
- 20.Moraga, P. Spatial Statistics for data Science: Theory and Practice with R 20–208 (Chapman & Hall/CRC Data Science Series, 2023).
- 21.Lindgren, F., Rue, H. & Lindstrom, J. An explicit link between Gaussian fields and Gaussian Markov random fields the SPDE approach. J. R Stat. Soc. B 423–498 (2011).
- 22.Virgilio Gómez Rubio. Bayesian Inference with INLA1-257 (Chapman&Hall/CRC, 2020).
- 23.Maeder, P. et al. Soil fertility and Biodiversity in Organic Farming. Science. 296(5573), 1694–1697 (2002). [DOI] [PubMed] [Google Scholar]
- 24.Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature. 521, 452–459 (2015). [DOI] [PubMed] [Google Scholar]
- 25.Doucet, A., Freitas, J. F. G. & Gordon, N. J. Sequential Monte Carlo Methods in Practice, 23-98. Springer (2000).
- 26.Tenenbaum, J. B., Kemp, C., Griths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science. 331, 1279–1285 (2011). [DOI] [PubMed] [Google Scholar]
- 27.Neal, R. M. MCMC using hamiltonian dynamics. In (eds Brooks, S., Gelman, A. & Meng, X. L.) G. J. Handbook of Markov Chain Monte Carlo (Chapman & Hall / CRC, (2010).
- 28.Pekel, J. F., Cottam, A., Gorelick, N. & Belward, A. High-resolution mapping of global surface water and its long-term changes. Nature. 540, 418–422 (2016). [DOI] [PubMed] [Google Scholar]
- 29.Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science. 324, 81–85 (2009). [DOI] [PubMed] [Google Scholar]
- 30.Christopher, K. & Mark, B. Probabilistic programming: A review for environmental modellers. Environ. Modell Softw.114, 40–48. 10.1016/j.envsoft.2019.01.014 (2019). [Google Scholar]
- 31.Guiming, W. Bayesian regression models for ecological count data in PyMC3. ECOL. Inf.10.1016/j.ecoinf.2021.101301 (2021). 63,101301. [Google Scholar]
- 32.Bradford, M. A. et al. Managing uncertainty in soil carbon feedbacks to climate change. Nat. Clim. Change. 6, 751–758 (2016). [Google Scholar]
- 33.Brocca, L., Melone, F., Moramarco, T., Wagner, W. & Hasenauer ASCAT soil wetness index validation through in situ and modeled soil moisture data in central Italy. Remote Sens. Environ.114, 2745–2755 (2010). [Google Scholar]
- 34.Alan, E., Diggle, G., Guttorp, P. J., Fuentes & P. and M. Handbook of Spatial Statistics (Chapman & Hall/CRC, 2010).
- 35.Finn, L. & Rue, H. B. Spatial modelling with R-INLA. J. Stat. Softw.10.18637/jss.v063.i19 (2015). .63. [Google Scholar]
- 36.Cameletti, M., Finn, L., Simpson, D. & Rue, H. Spatio-temporal modeling of particulate matter concentration through the SPDE approach. ASTA-Adv Stat. Anal.97(2), 109–131. 10.1007/s10182-012-0196-3 (2012). [Google Scholar]
- 37.Carbó, E. et al. Modeling influence of Soil properties in different gradients of Soil moisture: the case of the Valencia Anchor Station Validation Site, Spain. Remote Sens.13, 5155. 10.3390/rs13245155 (2021). [Google Scholar]
- 38.Christian, P. R., Cornuet, J. M., Marin, J. M. & Pillai, N. S. Lack of confidence in approximate bayesian computation model choice. P Natl. Acad. Sci.108(37), 15112–15117 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Brooks, S., Gelman, A., Jones, G. & Meng, X. L. Handbook of Markov Chain Monte Carlo (Chapman & Hall/CRC Handbooks of Modern Statistical Methods (CRC, 2011).
- 40.Patil, A., Huard, D., Fonnesbeck, C. & PyMC Bayesian stochastic modelling in python. J. Stat. Softw.35(4), 1–81 (2010). [PMC free article] [PubMed]
- 41.Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci.2, e55 (2016). [Google Scholar]
- 42.Martin, O. Packt Publishing Press,. Bayesian Analysis with Python: Introduction to Statistical Modeling and Probabilistic Programming Using PyMC3 and ArviZ, 2nd Edition (2018).
- 43.Martin, O., Kumar, R. & Junpeng, L. Bayesian Modeling and Computation in Python (CRC, 2022).
- 44.Carpenter, B. et al. Ridell. Stan: A probabilistic programming language. J. Stat. Softw.76(1), 1–32 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hoffman, M. D. & Gelman, A. The No-UTurn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res.15(1), 1593–1623 (2014). [Google Scholar]
- 46.Vehtari, A. et al. Rank-Normalization, folding, localization: an improved widehat R$ for assessing convergence of MCMC. Bayesian Anal. 1–30 (2021).
- 47.McElreath, R. & Rethinking Statistical rethinking course and book package. R Package Version. 1, 59 (2017). https://github.com/rmcelreath/rethinking [Google Scholar]
- 48.Yujian, Y. & Yingqiang, S. Application of poisson process to drought prediction—the case study of Yucheng city. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVIII–3/W1, 73–78 (2022).
- 49.Davidson, P. C. (2015). http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
- 50.Vereecken, H., Amelung, W. & Bauke, S. L. Soil hydrology in the Earth system. Nat. Rev. Earth Environ.3, 573–587 (2022). [Google Scholar]
- 51.Karianne, J., Bergen, P. A., Maarten, J. V. H. & Gregory, C. B. Machine learning for data-driven discovery in solid Earth geoscience. Science. 363 (6433), eaau0323. 10.1126/science.aau0323 (2019). [DOI] [PubMed] [Google Scholar]
- 52.Kingma, D. P. & Welling, M. Auto-encoding Variational Bayes. (2013). arXiv:1312.6114 [stat.ML].
- 53.Patel, A. B., Nguyen, M. T. & Baraniuk, R. A probabilistic framework for deep learning. Adv. Neural Inf. Process. Syst.29, 2558–2566 (2016). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data availability statementAll data included in this study are available upon request by contact with the corresponding author.





















