Integrating random forest-based regression kriging for analyzing spatial variability of rainfall in arid and semi-arid regions

Marwa Manaf; Zulfiqar Ali; Miklas Scholz

doi:10.1038/s41598-026-36074-4

. 2026 Jan 16;16:5298. doi: 10.1038/s41598-026-36074-4

Integrating random forest-based regression kriging for analyzing spatial variability of rainfall in arid and semi-arid regions

Marwa Manaf ¹, Zulfiqar Ali ^1,^✉, Miklas Scholz ²

PMCID: PMC12880969 PMID: 41545471

Abstract

Understanding the spatial variability of precipitation is essential for water resource management and climate adaptation, especially in arid and semi-arid regions with strong spatiotemporal heterogeneity. Traditional geostatistical methods, such as ordinary kriging, often struggle to capture nonlinear relationships between rainfall and spatial coordinates. This study focuses on comparing ML–RK methods for spatial interpolation using only latitude and longitude as predictors, rather than developing a full rainfall prediction model. As, machine learning techniques integrated with regression kriging (RK) have wide applications for capturing complex spatial patterns. Therefore, this study evaluates RK combined with six regression models including Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Network (NN), Elastic Net (EN), and Polynomial Regression (PR). In this research, we used monthly and decadal averages of precipitation from 42 meteorological stations (2001–2021) of Pakistan. For assessing optimal spatial structure, four theoretical variogram models including exponential, circular, spherical, and linear–were tested using Leave-One-Out Cross-Validation. Here, the performance of the variogram was assessed using RMSE and MAE. Outcomes associated with this research show that RF-RK consistently outperformed other combinations of ML-RK. Consequently, the combination of ensemble learning and geostatistical interpolation effectively captured both nonlinear relationships and spatial dependencies. The resulting high-resolution rainfall maps can support climate adaptation planning, irrigation scheduling, and sustainable management of water resources in data-scarce regions such as Pakistan.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-36074-4.

Keywords: Precipitation, Machine learning, Random forest, Regression kriging, LOOCV, RF–RK

Subject terms: Climate sciences, Environmental sciences, Hydrology

Introduction

Global warming and climate change are the major environmental challenges of the twentieth century. They significantly affect physical, biological, and socioeconomic systems^1–3. In recent decades, rising greenhouse gas emissions have increased global temperatures and disrupted hydrological cycles. Consequently, extreme events such as floods, droughts, and irregular rainfall have been reported more frequently⁴. Recurrent occurrence of these events threaten agriculture, water resources, and livelihoods^5,6. Among other variables, precipitation is key in the regulation of water balance and the support of ecosystems⁷. Therefore, understanding spatial and temporal variability of rainfall is important for managing climate risks and planning adaptation. In this perspective, accurate information on rainfall can help identify vulnerable regions and support exploratory water-resource assessments.

Spatial modeling of rainfall helps investigators understand and analyze environmental patterns. In past research, various statistical and geostatistical methods have been applied by numerous researchers to study the distribution of rainfall. For example⁸, applied IDW, OK, GWR, and MGWR to study spatio-temporal rainfall changes in Peninsular Malaysia. Similarly⁹, compared Thiessen, IDW, TPS, OK, and Co-Kriging using ERA5 rainfall data (2008–2018) in Emilia-Romagna, Italy. In another study¹⁰, applied OK, UK and Co-Kriging in the Central Himalayas. These methods improve the understanding of spatial rainfall, but they are often linear and may not fully capture complex spatial relationships. To address this, advanced geostatistical and ML-assisted approaches can provide more flexible spatial mapping.

Machine learning algorithms have increasingly been applied in geospatial studies to model complex and nonlinear spatial relationships that are difficult to capture using traditional interpolation methods. In rainfall mapping, ML models are often used to complement geostatistical approaches by improving the representation of spatial gradients under data-scarce conditions. Although ML techniques have been widely applied across environmental domains, the present study focuses specifically on their role in spatial rainfall interpolation. For example¹¹, applied RF and XGBoost to monitor drought using spatial predictors, while^12,13 demonstrated the ability of ML models to represent nonlinear spatial patterns in environmental variables.

Random Forest is a popular ML algorithm due to its robustness, ability to handle high-dimensional data, and fast computation¹⁴. It works well with categorical and continuous predictors. For example¹⁵, used RF for flood quantiles in Quebec, Canada, and¹⁶ applied RF for extreme rainfall forecasting in Bangladesh¹⁷. Applied RF to forecast rainfall in the Paute River Basin, showing good performance for environmental estimation.

Several studies have combined RF or other ML algorithms with geostatistical interpolation to improve spatial estimation^18–20. These studies demonstrate the use of ML-RK combinations in environmental mapping. However, systematic comparisons for rainfall mapping in data-scarce regions, such as Pakistan, remain limited.

To address this gap, the present study evaluates the performance of six ML algorithms (PR, SVM, RF, NN, KNN, EN) integrated with regression kriging for spatial rainfall mapping in Pakistan. The analysis uses monthly mean precipitation data from 42 stations obtained from NASA POWER (2001–2021). Data are aggregated in two decades (2001–2011, 2011–2021) and each month is analyzed separately to capture spatial variability. Latitude and longitude are used as predictors to model spatial relationships, while external environmental covariates are not included.

The main objective is to systematically compare six combinations of (ML–RK, PR–RK, SVM–RK, RF–RK, NN–RK, KNN–RK, EN–RK) for decade-wise rainfall mapping in Pakistan. This study provides insight into the relative performance of these methods and helps to understand spatial rainfall patterns under data-scarce conditions.

The paper is organized into five sections. Section 2 describes materials and methods, Section 3 presents the results, Section 4 discusses the findings and Section 5 concludes the study.

Materials and methods

Precipitation data and processing

In this research, we used monthly precipitation data from 42 meteorological stations in Pakistan. The station network covers humid, sub-humid, arid, and semi-arid regions. The station coordinates range from 24 Inline graphic N to 37N latitude and from 61E to 76E longitude. Data were obtained from NASA POWER (https://power.larc.nasa.gov/data-access-viewer/ ) for the period 1981–2021. The data set provides spatially consistent climate observations that are bias-corrected; therefore, it is suitable for regional-scale precipitation analysis.

Here, outliers at each station were identified using exploratory data analysis. However, in this study, detected outliers were replaced with the monthly mean across all stations to maintain data continuity. Further, the monthly data were aggregated into decadal averages (2001–2010 and 2011–2021) to examine long-term patterns rather than year-to-year variability.

In statistical modeling, latitude and longitude were used as predictors. In addition, other covariates (e.g. elevation, climate indices, and land use) were not included, since this study focuses on comparing ML–RK methods using a minimal set of predictors. The model evaluation was performed using leave-one-out cross-validation. However, no independent gage validation was conducted, as this aligns with the study’s focus on spatial interpolation rather than full hydrological assessment.

Figure 1 shows the station locations, while Table 1 reports monthly statistics, including the mean, standard deviation, minimum, and maximum values. These statistics illustrate seasonal and decadal variability.

Fig. 1 — Map of Pakistan showing the spatial distribution of 42 meteorological stations used in the study.

Table 1.

Summary statistics of monthly average precipitation for meteorological stations of Pakistan for two decades: Decade-1 (2001–2011) and Decade-2 (2011–2021).

Month	Min	Q1	Median	Mean	Q3	Max
Decade-1 (2001–2011)
Jan	0.0400	0.2652	0.5825	0.6287	0.9030	1.4890
Feb	0.0620	0.3317	1.0385	1.2858	1.7905	3.4580
Mar	0.1030	0.2973	0.7420	0.9361	1.7272	2.2900
Apr	0.0120	0.1412	0.7010	0.8924	1.5517	2.3640
May	0.0110	0.1590	0.5335	0.5106	0.8400	1.2890
Jun	0.0230	0.5440	0.8455	1.0456	1.2715	2.6050
Jul	0.0310	0.7582	1.4435	2.1431	3.1190	7.8750
Aug	0.0770	0.7668	1.4000	1.7995	2.2950	5.7860
Sep	0.0050	0.3272	0.5355	0.7716	1.2337	2.1640
Oct	0.0120	0.0968	0.2760	0.2921	0.3863	0.6970
Nov	0.0240	0.1113	0.1975	0.3046	0.5000	0.7860
Dec	0.0430	0.3508	0.4820	0.5299	0.6703	1.0810
Decade-2 (2011-2021)
Jan	0.0210	0.2705	0.6205	0.8237	1.4163	2.2180
Feb	0.0230	0.3683	0.9990	1.5359	2.5640	4.5080
Mar	0.0220	0.6823	1.6115	1.8649	2.8748	4.4150
Apr	0.0560	0.4295	1.1415	1.4406	2.5002	4.4460
May	0.0590	0.5533	0.9770	1.1638	1.7145	3.4370
Jun	0.0000	0.3227	1.0180	1.2526	1.9503	4.2020
Jul	0.0220	0.7960	1.2640	2.4020	3.8960	7.8250
Aug	0.1050	1.0810	1.8830	2.3340	3.0700	6.7410
Sep	0.0020	0.5650	1.2820	1.4900	2.3120	3.7440
Oct	0.0480	0.1615	0.5105	0.6907	1.2325	2.0110
Nov	0.0280	0.2052	0.5200	0.6207	1.0140	1.4460
Dec	0.0260	0.1032	0.2050	0.4429	0.5927	1.6440

Open in a new tab

Regression kriging

RK is a hybrid spatial prediction technique that integrates a deterministic regression model with a geostatistical interpolation model^21,22. The fundamental concept of RK is to model the deterministic trend component and the spatially correlated residuals independently, thereby allowing joint modeling of deterministic trends and spatially correlated residuals. The deterministic component captures large-scale spatial variations using auxiliary predictors such as elevation, land cover, and climatic parameters, while the stochastic component employs ordinary kriging to model the spatial autocorrelation of regression residuals²³. This framework accounts for deterministic relationships and localized spatial dependencies in spatial data. Mathematically, RK can be expressed as follows:

where Inline graphic denotes the predicted value at location , represents the estimated deterministic trend derived from the auxiliary predictors with their corresponding regression coefficients , and refers to the residual kriged obtained by applying kriging weights to the observed residuals . By combining regression modeling with kriging of residuals, RK integrates deterministic and stochastic spatial components²⁴. For example²⁵, applied RK to map soil properties in Central Vietnam by combining environmental variables such as land use type, topographic wetness index, and vegetation index.

Machine learning models

Random forest

The RF ensemble learning technique was initially introduced by²⁶. RF enhances bootstrap aggregation (bagging) and random feature selection to increase prediction accuracy and stability. At each split node, it analyzes a random subset of predictors and constructs several decision trees, each trained on a bootstrap sample of the data. This reduces correlation by increasing the randomization of trees. Random Forest reduces overfitting and improves prediction stability. In previous research, several researchers have used RF in various disciplines. Some of them are^27–29. In this paper, we used RF to assess nonlinear relationships and spatial patterns.

Polynomial regression

On the same line of RF, we have also used PR to capture nonlinear trends in rainfall data. Here, PR extends linear regression by including higher-order polynomial terms to model nonlinear relationships³⁰. In past research, several researchers have used PR for modeling relationships. Some of them include^31–33. PR approximates complex relationships while avoiding overfitting. The model form of PR is as follows:

where Inline graphic is the predicted response, the predictor, coefficients, the polynomial degree, and the error term.

Support vector machine

The SVM constructs an optimal hyperplane to separate data points in feature space³⁴. The kernel technique allows handling nonlinear tasks. Mathematically:

SVM was applied to model rainfall patterns, effectively handling nonlinear dependencies.

K-nearest neighbors

KNN predicts outcomes based on the majority or average of its ‘k’ nearest neighbors³⁵. For regression:

where Inline graphic are the k-nearest neighbors. KNN was used to capture local rainfall variability, particularly useful for spatially clustered precipitation data.

Neural network

NNs learn complex nonlinear mappings between inputs and outputs. A feedforward NN with one hidden layer is:

where Inline graphic is input, weights, biases, and activation functions. In this study, NN was applied to model complex rainfall patterns across different months.

Elastic net

EN combines LASSO ( Inline graphic ) and Ridge () penalties for variable selection and coefficient shrinkage³⁶:

where Inline graphic and control penalty strengths. EN was used to explore linear trends in rainfall and serve as a comparison to nonlinear methods.

Comparative assessment and measures

Leave-one-out cross validation

LOOCV is a special case of k-fold cross-validation that has been widely adopted for model evaluation and performance assessment in statistical and machine learning applications. It was developed as an impartial method to estimate the generalization error by leaving out one observation at a time and training the model on the remaining data and testing it on the excluded point³⁷. This process is repeated for all n observations, and the resulting prediction errors are averaged to produce an overall estimate of the accuracy of the model. Its robustness and simplicity make it a reliable choice for validating both traditional statistical and modern machine learning models.

Evaluation metrics

Model performance is commonly assessed using statistical metrics such as the Coefficient of Determination ( Inline graphic ), MAE, and RMSE^38,39. These metrics provide complementary information on the accuracy, bias, and variability of the predictions in regression and spatial interpolation models. RMSE and MAE are often employed to select optimal models during variogram fitting or other calibration steps, with lower values indicating better performance⁴⁰. while Inline graphic quantifying the proportion of variance in the observed data explained by the model. These metrics are widely used in hydrology, climatology, and environmental modeling to evaluate the predictive performance of interpolation methods such as ordinary kriging, regression kriging, and machine learning-based spatial predictions^41,42. The formulas are defined as follows:

where Inline graphic and represent the observed and predicted values, respectively, and is the mean of the observed values.

Results

Table 1 summarizes the monthly precipitation statistics for meteorological stations in Pakistan over two decades (2001–2011 and 2011–2021). Decade 2 (2011–2021) exhibits higher mean monthly precipitation values than Decade 1 (2001–2011) for most months in the aggregated dataset. Differences between the two decades are more pronounced during March, April, and May, reflecting higher aggregated mean values in the second decade. Peak rainfall occurs in July and August for both decades and reflects the influence of the monsoon. Minimum values remain close to zero for several months, which highlights dry periods. The interquartile range widens in decade 2, suggesting greater variability in monthly rainfall. These patterns provide a foundation for subsequent spatial modeling and interpolation analyses. The following subsections present specific components of the analysis. Section 3.1 provides the assessment of variogram models to identify the spatial dependence structure of rainfall. Section 3.2 presents the evaluation of machine learning-based regression kriging models to examine their predictive performance. It also presents the subsequent figures, and tables provide spatial visualization and comparative analysis of rainfall prediction across decades.

Assessment of appropriate variogram model

Variography was conducted to examine the spatial dependence structure of decadal mean monthly precipitation across Pakistan for the two study periods.

Spatial variability of rainfall across decades

Empirical variograms were computed separately for Decade 1 (2001–2011) and Decade 2 (2011–2021) to show how rainfall similarity decays with distance. The empirical variograms revealed distinct seasonal differences and decadal contrasts in spatial dependence patterns. In Decade 1 (2001–2011) (see Table 2), semivariance values generally increased with lag distance, indicating the presence of spatial autocorrelation in the precipitation field.

Table 2.

Empirical variogram analysis of monthly averaged precipitation for Decade-1 (2001–2011).

N(h)	h
N(h)	h	Jan	Feb	Mar	Apr	May	June	Jul	Aug	Sep	Oct	Nov	Dec
8	0.31	0.05	0.28	0.11	0.19	0.03	0.11	1.39	0.84	0.12	0.01	0.03	0.05
10	0.66	0.20	1.06	1.81	7.09	25.04	41.59	45.52	40.84	27.50	13.42	2.66	0.36
15	0.95	0.10	0.74	1.12	3.29	8.63	13.31	14.01	12.70	8.49	4.51	1.05	0.20
29	1.37	0.20	1.27	0.95	2.24	4.81	7.96	11.98	9.88	5.43	2.50	0.58	0.14
58	1.83	0.18	1.01	1.28	3.73	10.74	16.51	18.01	16.37	10.97	5.73	1.28	0.22
54	2.21	0.21	1.10	1.33	4.03	11.59	18.84	21.80	18.80	12.40	6.15	1.32	0.24
37	2.59	0.21	1.21	1.07	2.83	6.72	10.30	12.48	10.97	6.90	3.63	0.85	0.18
42	3.02	0.14	0.76	0.64	1.56	3.22	5.29	9.73	7.43	3.65	1.72	0.40	0.11
30	3.40	0.21	1.14	2.59	8.19	21.50	33.74	38.18	33.65	22.28	11.49	2.57	0.45
48	3.83	0.16	0.95	1.06	2.87	7.82	13.40	18.58	14.86	8.97	4.25	0.89	0.17
41	4.21	0.14	0.71	0.33	0.40	0.07	0.47	3.61	1.82	0.31	0.03	0.05	0.05
22	4.63	0.24	1.41	1.13	2.61	5.66	10.13	15.46	12.17	6.79	3.12	0.69	0.18
33	5.00	0.15	0.76	0.78	1.89	4.20	6.93	9.83	8.06	4.64	2.19	0.53	0.12
24	5.41	0.19	1.02	1.08	2.64	5.65	9.26	12.84	10.31	6.26	2.97	0.70	0.14
18	5.82	0.11	0.49	0.28	0.30	0.12	0.36	2.31	1.21	0.27	0.03	0.03	0.02

Open in a new tab

The table presents lag distance (h), the number of observation pairs (N(h)), and corresponding semivariance values ( Inline graphic ) for each month.

Winter months (January–March) showed low semivariance at short distances (0.05 for Jan and 0.28 in Feb at 0.31 lag distance) and a moderate increase with distance, showing localized rainfall patterns caused by western disturbances. During spring (April–May), semivariance increased sharply, with April reaching about 8.2 and May exceeding 25.0 at mid-range lags, showing transitional convective activity. The monsoon months (June–September) showed the highest semivariance levels, exceeding 40 in July and August, indicating strong spatial continuity and widespread rainfall typical of monsoon systems. In contrast, the postmonsoon months (October-December) showed low semivariance values, indicating weak spatial correlation under dry conditions. For Decade-2 (2011–2021) (see Table S1 in Supplementary Material), semivariance values generally increased with lag distance, indicating the presence of spatial autocorrelation in the precipitation field.

The semivariance in January increased from 0.11 to 0.5 at short lags, while in April and May it rose to 8.1 and 17.9 at midrange distances, indicating stronger convective rainfall. During the monsoon months, semivariance values were the highest–exceeding 45 in July and 39 in August–thereby confirming a strong spatial dependence. However, in the post-monsoon period, the semivariance values dropped below 0.4, which is consistent with the prevailing dry conditions. Four commonly used theoretical variogram models–Exponential (Exp), Exponentially Correlated Cauchy (Exc), Linear (Lin), and Circular (Cir)–were fitted to the empirical monthly variograms (see Figures 2 and S1 in Supplementary Material) to represent the spatial structure. The fitted curves generally followed the empirical trends; however, their shapes varied across months and seasons. The linear and circular models captured the gradual increase in semivariance more effectively and showed better visual agreement with the empirical patterns in most months. The circular model exhibited smoother transitions during the monsoon months, such as July and August, while the linear model represented steady semivariance growth during transitional months like April, May, and October. The Exc model, on the other hand, appeared nearly flat at larger lags, suggesting that it captured only short-range variability. However, visual fitting alone cannot ensure predictive reliability. Therefore, LOOCV was applied to objectively evaluate and compare the performance of the fitted models.

Fig. 2 — Monthly experimental (empirical) and fitted theoretical variogram models for averaged precipitation in Decade 1 (2001–2011). Each subfigure (January–December) displays the empirical variogram alongside four fitted theoretical models.

Comparative evaluation of variogram models via LOOCV

To complement the visual evaluation of the fitted variograms, Leave-One-Out Cross-Validation (LOOCV) was performed to quantify the predictive accuracy and stability of the models, thereby facilitating the objective selection of the most appropriate variogram model (Tables 3 and 4). In this approach, each station was successively omitted and its value predicted using the remaining stations, while the prediction error was quantified using the RMSE and MAE. Additionally, the frequency with which each model produced both the lowest RMSE and MAE for a given month was used as an indicator of performance consistency.

Table 3.

Performance comparison of four variogram models (Exponential, Exponentially Correlated Cauchy, Linear, and Circular) for monthly averaged precipitation in Decade 1 (2001–2011) using LOOCV.

Month	Variogram models
	Exponential			Exponentially corr. cauchy			Linear			Circular
	Freq	RMSE	MAE	Freq	RMSE	MAE	Freq	RMSE	MAE	Freq	RMSE	MAE
Jan	0	0.0381	0.0320	0	0.0489	0.0407	3	0.0370	0.0305	38	0.0361	0.0300
Feb	0	0.2908	0.2331	0	0.3578	0.2872	25	0.2614	0.2161	17	0.2666	0.2187
Mar	0	0.0989	0.0788	0	0.1299	0.1049	7	0.0900	0.0734	35	0.0893	0.0728
Apr	1	0.1397	0.1090	0	0.1698	0.1318	30	0.1254	0.1005	11	0.1272	0.1027
May	0	0.0293	0.0236	0	0.0349	0.0274	10	0.0276	0.0227	32	0.0275	0.0225
Jun	0	0.1249	0.0922	0	0.1541	0.1159	37	0.1065	0.0819	5	0.1085	0.0824
Jul	0	0.9793	0.7807	0	1.2328	0.9371	19	0.8924	0.7432	23	0.8920	0.7389
Aug	0	0.5969	0.4761	0	0.7446	0.5817	40	0.5248	0.4211	2	0.5333	0.4265
Sep	0	0.0827	0.0646	0	0.1036	0.0778	38	0.0757	0.0586	4	0.0766	0.0603
Oct	0	0.0105	0.0088	0	0.0121	0.0104	12	0.0091	0.0073	30	0.0091	0.0073
Nov	2	0.0119	0.0080	38	0.0113	0.0075	0	0.0124	0.0085	2	0.0124	0.0084
Dec	5	0.0153	0.0118	1	0.0157	0.0122	23	0.0151	0.0116	13	0.0151	0.0116

Open in a new tab

For each month, the table reports the frequency of model selection, RMSE, and MAE, allowing assessment of model accuracy and suitability across different months.

Table 4.

Monthly evaluation of four variogram models (Exponential, Exponentially Correlated Cauchy, Linear, and Circular) for averaged precipitation during Decade 2 (2011–2021) based on LOOCV.

Month	Variogram models
	Exponential			Exponentially corr. cauchy			Linear			Circular
	Freq	RMSE	MAE	Freq	RMSE	MAE	Freq	RMSE	MAE	Freq	RMSE	MAE
Jan	0	0.0953	0.0764	0	0.1239	0.1005	4	0.0851	0.0689	38	0.0817	0.0645
Feb	1	0.4709	0.3894	0	0.5961	0.5005	13	0.4579	0.3774	28	0.4373	0.3610
Mar	0	0.5308	0.4461	0	0.6963	0.5918	1	0.4657	0.3812	41	0.4506	0.3778
Apr	0	0.2919	0.2427	0	0.3952	0.3049	2	0.2510	0.2127	40	0.2471	0.2126
May	6	0.1271	0.1088	0	0.1747	0.1474	5	0.1234	0.1072	31	0.1210	0.1039
Jun	5	0.4538	0.3838	0	0.4849	0.4068	10	0.4603	0.3916	27	0.4388	0.3706
Jul	1	1.3978	1.1337	0	1.6575	1.3029	38	1.3176	1.0589	3	1.3337	1.0812
Aug	3	1.0356	0.8564	1	1.2225	0.9814	33	0.9976	0.7944	5	1.0035	0.8131
Sep	0	0.2916	0.2306	0	0.3532	0.2587	38	0.2751	0.2160	4	0.2782	0.2208
Oct	1	0.0808	0.0646	0	0.1034	0.0843	21	0.0726	0.0565	20	0.0724	0.0569
Nov	8	0.0318	0.0245	31	0.0304	0.0262	1	0.0338	0.0239	2	0.0335	0.0235
Dec	0	0.1173	0.1021	0	0.1356	0.1178	38	0.1024	0.0858	4	0.1052	0.0882

Open in a new tab

The table lists the number of times each model was selected (frequency), along with RMSE and MAE for each month, highlighting variations in model performance and suitability across seasons and years.

The LOOCV results clearly demonstrate that the circular and linear models provided the most accurate and consistent performance across both decades. In Decade 1 (2001–2011) (Table 3), the circular model achieved the highest frequency and lowest prediction errors during the winter and monsoon months, including January (Freq = 38; RMSE = 0.036) and July (Freq = 23; RMSE = 0.892), where rainfall exhibited strong spatial continuity. The linear model performed best during the transitional months, such as April (Freq = 30; RMSE = 0.125), effectively capturing the steady, moderate semivariance increase associated with convective rainfall. The Exc model showed its strength only in November (Freq = 38; RMSE = 0.011), reflecting its ability to model weak but localized spatial dependence under dry post-monsoon conditions. In contrast, the exponential model consistently produced higher RMSE and MAE values and was not identified as optimal in any month.

A similar pattern was observed in Decade 2 (2011–2021) (Table 4), where the circular model again dominated the early and spring months, achieving the lowest RMSE values in March (Freq = 41; RMSE = 0.451) and April (Freq = 40; RMSE = 0.247). Its smooth and stable fit aligns well with the broad spatial continuity characteristic of monsoon and pre-monsoon rainfall. The linear model once more performed best during the peak and late monsoon months, particularly in July (Freq = 38; RMSE = 1.318) and August (Freq = 33; RMSE = 0.998), indicating its effectiveness in representing the near-linear spatial growth of heavy precipitation events. The Exc model provided the best fit only in November (Freq = 31; RMSE = 0.030), while the Exponential model remained consistently weak.

Overall, these findings confirm the robustness of the circular and linear models in characterizing Pakistan’s spatial rainfall variability, thereby providing a reliable foundation for subsequent regression-kriging analysis. These results describe the spatial dependence characteristics of decadal mean precipitation derived from the NASA POWER dataset and are intended to support comparative spatial interpolation experiments rather than to infer physical rainfall-generating processes.

Assessment of machine learning-integrated regression kriging performance

RK was applied to combine machine learning–based regression estimates with geostatistical interpolation of residuals for spatial rainfall prediction. This hybrid approach combines the predictive power of machine learning algorithms with the spatial dependence modeling capability of kriging. Six regression algorithms were implemented to represent the regression component of the RK framework. Subsequently, OK was applied to their residuals using the best-fit variogram models identified earlier.

In the first stage, each machine learning model was trained independently using average precipitation as the dependent variable and spatial coordinates (longitude and latitude) as explanatory features. This step allowed the models to learn spatial relationships between precipitation and geographic coordinates. Consequently, the regression component generated predicted rainfall surfaces that reflected broad spatial gradients, though it left some localized spatial dependence unexplained.

In the second stage, the residuals computed as the differences between observed and predicted rainfall were subjected to geostatistical analysis. The residuals exhibited spatial autocorrelation not explained by the regression component. To address this, OK was performed on the residuals using the optimal variogram parameters derived from the variography analysis (Section 3.1). The kriging process estimated spatially dependent residuals at unsampled locations based on neighboring data points. The final rainfall prediction was obtained by summing the kriged residuals with the initial regression predictions. This two-step procedure ensured that both large-scale deterministic relationships and localized spatial variability were incorporated, thereby improving spatial prediction accuracy.

The performance of the six RK models was rigorously evaluated using k-fold cross-validation. Model accuracy was assessed through three complementary metrics: RMSE, MAE, and Inline graphic , (see Supplementary Table S2). Among all models, the RF-based RK model consistently yielded the lowest RMSE and MAE and the highest across both decades. For example, in Decade 1, RF reduced RMSE by 38–60% compared to Polynomial Regression (PR) during high-precipitation months (July–August) and achieved the highest Inline graphic values up to 0.88, demonstrating superior predictive accuracy and stability.

During most months of Decade 1 (2001–2011), RF achieved the lowest RMSE and MAE and the highest Inline graphic , indicating superior accuracy and stability. For example, in June, RF recorded RMSE = 0.20, MAE = 0.15, and , which were the best values among all models. Similarly, strong performance was observed in July (RMSE = 0.66, MAE = 0.51, ) and September (RMSE = 0.21, MAE = 0.16, ), demonstrating RF’s stable performance during high-precipitation months, with RMSE reductions of 38% in July and 35% in August compared to SVM, and Inline graphic values exceeding 0.88, highlighting its ability to accurately capture monsoon rainfall variability. The SVM model ranked second, yielding moderate prediction errors (e.g., April RMSE = 0.61, MAE = 0.38, ) and showing good nonlinear fitting capacity, though less consistency across months. NN and EN achieved comparable accuracy during several dry months, such as November (EN RMSE = 0.21, Inline graphic ), but their performance declined during high-variability monsoon periods. KNN tended to over-smooth rainfall variations, achieving moderate values () but higher RMSE. In contrast, PR consistently performed the worst, with large RMSE values (e.g., July = 1.55) and low (), indicating limited suitability for capturing nonlinear spatial rainfall relationships.

Model performance patterns during Decade 2 (2011–2021) reinforced the dominance of RF. Despite larger ranges in aggregated monthly precipitation values during this decade, RF continued to yield the lowest RMSE (0.16–0.75) and the highest Inline graphic values (0.70–0.88) in nearly all months. For instance, December recorded RMSE = 0.16, MAE = 0.13, and , while July and August the monsoon core months achieved values of approximately 0.88–0.86 with moderate RMSE around 0.68–0.75. These results indicate that RF–RK maintains comparatively stable performance under higher precipitation magnitudes and spatial autocorrelation. The SVM model again ranked second, performing relatively well during early and mid-year months such as June (RMSE = 0.67, Inline graphic ) and September (RMSE = 0.58, ), though its accuracy declined in low-rainfall periods. NN and KNN produced reasonable predictions (e.g., KNN in June) but were less stable under highly variable monsoon conditions. EN again underperformed with higher RMSEs (> 0.90) and low (< 0.30), reflecting its linear limitations, while PR remained the weakest method in all months (e.g., RMSE > 1.2, Inline graphic ).

The rainfall prediction and variance maps (see Figure 3 and Supplementary Figs. S2, S3, S4, S5 and S6) illustrate the spatial distribution of RK-based rainfall estimates for both decades (2001–2010 and 2011–2021). During the first decade (Figures 3 and Supplementary Fig. S2), rainfall prediction maps indicated lower rainfall during May–June and increased rainfall in July–August, consistent with the typical monsoon pattern. Among all models, the RF-based RK produced smoother spatial patterns and lower cross-validation errors, achieving the lowest RMSE and MAE and the highest Inline graphic across all months. Its maps displayed smooth rainfall gradients and gradual spatial transitions between wet and dry regions. NN and EN also captured rainfall variability reasonably well, though NN occasionally produced slightly over-smoothed patterns, and EN tended to underestimate high-rainfall areas. PR and KNN generated more fragmented and localized rainfall patterns, while SVM achieved moderate spatial consistency.

Fig. 3 — ML-based regression kriging outputs for January–April during Decade 1 (2001–2011). Each month is shown with predicted precipitation (left panel) and associated kriging variance (right panel). The maps depict early-year spatial rainfall distribution and highlight areas with higher prediction uncertainty.

In the second decade (see Supplementary Figs. S4, S5, and S6), rainfall intensity and spatial continuity increased. RF again outperformed all other models, producing the most realistic rainfall distribution with the lowest uncertainty. The nearly uniform variance maps for RF indicate that most spatial variability was captured during the regression stage, resulting in relatively lower residual kriging variance. Conversely, other models, particularly PR and KNN, exhibited higher uncertainty and less coherent rainfall structures.

Overall, the regression kriging framework effectively combined deterministic modeling with spatial interpolation. Hence, we infer that the framework can be used to improve predictive performance relative to standalone regression models. From this research, we assessed that RF-RK model consistently emerged as the most robust and reliable approach across all months and both decades in Pakistan. Here, the superior performance of the RF-RK mode highlights its ability to represent nonlinear interactions and spatial autocorrelation simultaneously. These inferences reflect the RF-RK model as a powerful tool for spatiotemporal rainfall modeling. Therefore, the RF–RK model was identified as the most suitable approach for subsequent spatial mapping and environmental analysis due to its high precision, low uncertainty, and stable performance under varying climatic conditions.

Discussion

Assessing spatial patterns of rainfall is essential for evaluating the impact of global warming and climate change. This paper investigated the spatial structure and prediction of rainfall in Pakistan by first establishing a robust geostatistical foundation through variogram analysis and subsequently applying a Random Forest–based Regression Kriging framework for comparative spatial rainfall interpolation. Variogram analysis provided critical insights into the spatial dependence of rainfall and forming the basis for kriging interpolation and hybrid modeling. The RF–RK approach integrated the deterministic power of Random Forest with the stochastic capability of kriging, representing broad spatial gradients and localized residual variability in rainfall across different decades.

From the results of this research, we assessed that empirical variograms revealed distinct seasonal and decadal variations in the spatial structure of rainfall. Further, semivariance values were higher in Decade 2 (2011–2021) compared to Decade 1 (2001–2011), particularly during spring and monsoon months, indicating larger semivariance values and wider spatial dispersion in the aggregated precipitation fields during the recent decade. Among the tested variogram models, the circular and linear forms consistently provided the best fit and predictive performance when evaluated through leave-one-out cross-validation. These findings suggest that rainfall in Pakistan exhibits broad and continuous spatial dependence rather than abrupt spatial discontinuities- a pattern commonly reported in monsoon-influenced regions. Similar observations have been reported elsewhere. For instance⁴³, found that the Gaussian model best represented seasonal rainfall variability in the Zayandeh Rud Basin, Iran, while⁴⁴ reported that the spherical variogram provided the most accurate interpolation in the Brazilian Amazon. Such regional differences confirm that the optimal variogram model depends on regional climate characteristics and data resolution. The predominance of the circular and linear models in this study thus indicates smooth spatial transitions and directional rainfall gradients commonly observed in monsoon-driven regimes.

Building upon this variographic foundation, the RF–RK model was employed to integrate machine learning with spatial interpolation. The variogram parameters guided the kriging stage, while the Random Forest component captured large-scale deterministic trends. Kriging was then applied to the residuals to account for spatially correlated variability not explained by the regression stage. The RF–RK model consistently achieved the lowest RMSE and MAE and the highest Inline graphic values among all ML–RK combinations, indicating comparatively stronger predictive performance among the tested models. This outcome demonstrates that ensemble learning–based kriging is particularly effective in modeling nonlinear spatial relationships based on geographic coordinates while maintaining spatial coherence derived from the variogram structure. Comparable findings have been reported in previous studies where RF-based geostatistical models exhibited strong reliability for meteorological variable prediction^45,46, and more recent studies have applied machine learning for rainfall mapping in arid and semi-arid regions^47–49.

The enhanced performance of the RF–RK model arises from the ensemble nature of Random Forest, which aggregates multiple decision trees to model nonlinear relationships between rainfall and spatial predictors. When combined with kriging-based residual correction, the RF–RK framework effectively integrates deterministic and stochastic components yielding improved performance relative to the other ML–RK combinations evaluated in this study. This is consistent with previous research⁵⁰, demonstrated that Random Forest substantially improved precipitation estimation in complex terrain, while⁵¹ showed that a spatial Random Forest downscaling calibration model produced more accurate precipitation estimates in heterogeneous landscapes. These results reinforce that ensemble geostatistical hybrid models outperform purely statistical or parametric methods, particularly in regions characterized by strong spatial rainfall gradients.

The decade-wise comparison further highlights the dynamic nature of rainfall variability in Pakistan. The increase in mean rainfall and the wider interquartile ranges in the Decade 2 (2011–2021) reflect higher aggregated precipitation values and greater spatial heterogeneity. The RF–RK model maintained high predictive accuracy in both decades, showing stable performance in both analyzed decades. Such adaptability is crucial for long-term hydrological and climate studies, where both rainfall processes and spatial patterns evolve over time.

From an applied perspective, these findings have significant implications for the management of water resources, flood mitigation, and agricultural planning in Pakistan. Accurate spatial rainfall predictions from the RF–RK framework may provide useful spatial rainfall inputs for hydrological and water-resources analyses, especially in data-scarce regions. The integration of machine learning into geostatistical interpolation illustrates how modern data-driven approaches can enhance classical methods and improve prediction accuracy where observation networks are sparse or unevenly distributed. In addition, the observed decade-long intensification in rainfall variability underscores the need for adaptive modeling frameworks that can incorporate climate change signals into operational forecasting systems. However, the present evaluation focused on statistical accuracy metrics and application-oriented assessments (e.g., basin-scale runoff sensitivity) that were beyond the scope of this study.

Although this study focused on Pakistan, the comparative ML–RK evaluation framework adopted here can be applied to other regions, subject to data availability and covariate selection, accordingly. Therefore, future research could explore advanced hybrid approaches such as deep learning–assisted kriging or spatio-temporal ensemble frameworks to further enhance predictive accuracy and capture evolving rainfall dynamics in response to climate change.

In summary, this paper provides a comprehensive geostatistical foundation through variogram analysis and demonstrates the efficacy of the RF–RK hybrid framework for spatial rainfall modeling. Here, variogram analysis provided essential structural understanding for spatial weights. On the other hand, the hybrid model leveraged this spatial dependency to generate comparatively improved spatial rainfall estimates. Collectively, these results emphasize the importance of integrating traditional geostatistics with modern ensemble learning to address the challenges of rainfall estimation in complex and data-limited environments such as Pakistan. From this research, we infer that the observed intensification of rainfall variability over recent decades further highlights the importance of ensemble–kriging approaches as useful tools for spatial rainfall analysis in data-limited settings.

Despite these findings, several limitations should be acknowledged. The regression component relied solely on geographic coordinates as predictors. Here, the analysis was based on aggregated precipitation from the NASA POWER product rather than on in-situ gage observations. Therefore, future work should incorporate additional physical covariates and independent validation data to further assess the robustness and hydrological relevance of ML–RK rainfall mapping.

Conclusion

This research evaluated the performance of multiple combinations of machine-learning-assisted regression kriging (ML–RK) for spatial precipitation interpolation. Here, we used monthly and decadal precipitation data from 42 meteorological stations (2001–2021) in Pakistan. Spatial dependence was first examined using variogram modeling, with circular and linear models consistently providing the best fit based on LOOCV. Six machine learning algorithms including Random Forest, Support Vector Machine, K-Nearest Neighbors, Neural Network, Elastic Net and Polynomial Regression were compared within a regression kriging framework. The RF–RK model generally achieved the lowest RMSE and MAE and the highest Inline graphic and showed comparatively better predictive performance and spatial coherence between the methods tested. However, the study focused on assessing spatial interpolation using limited predictors (latitude and longitude) rather than developing a new modeling framework. This emphasizes the comparative nature of the analysis and its relevance for data-scarce regions. The decade-wise comparison revealed an increase in rainfall variability in 2011–2021, highlighting the importance of flexible interpolation methods that capture evolving spatial patterns. Although the results demonstrate statistical improvements in rainfall estimation, the study did not assess hydrological applicability, such as extreme rainfall, basin-scale runoff, or drought indices. Future work should incorporate additional physical covariates and independent gage validation to evaluate the practical utility of ML–RK rainfall maps. In general, the results of this research support exploratory hydrological analysis and water-resource planning.

Supplementary Information

Supplementary Information 1.^{(663.6KB, pdf)}

Supplementary Information 2.^{(1.6MB, zip)}

Abbreviations

EN: Elastic net
GWR: Geographically weighted regression
IDW: Inverse distance weighting
KNN: K-nearest neighbours
ML: Machine learning
MLP: Multi-layer perceptron
MGWR: Multi-scale geographically weighted regression
NN: Neural net
OK: Ordinary kriging
PR: Polynomial regression
RF: Random forest
RK: Regression kriging

Author contributions

Conceptualization, methodology, software, Marwa Manaf and Miklas Scholz; validation, Marwa Manaf and Zulfiqar Ali and Miklas Scholz; formal analysis, Marwa Manaf; data collection, Marwa Manaf and Zulfiqar Ali; writing—preparation of the original draft, Marwa Manaf; writing—review and editing, Marwa Manaf; supervision, Zulfiqar Ali. All authors have read and agreed to the published version of the manuscript.

Funding

Authors receive no funding to prepare this manuscript.

Data availability

The precipitation data are publicly available and were directly downloaded from NASA POWER (https://power.larc.nasa.gov/data-access-viewer/). Users can reproduce the analysis by obtaining the same monthly precipitation data for Pakistan for the period 1981–2021 from the NASA POWER portal and running the provided R scripts.

Code availability

The R scripts used for variogram analysis and spatial interpolation in this study are publicly available in a Zenodo repository (https://zenodo.org/records/18143696). The code is written for general-purpose use and can be applied to any georeferenced dataset containing spatial coordinates and a response variable. All scripts are documented and designed to ensure transparency, reproducibility, and ease of reuse. Users may need to adjust file paths and data-loading sections according to their local directory structure.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Zandalinas, S. I., Fritschi, F. B. & Mittler, R. Global warming, climate change, and environmental pollution: recipe for a multifactorial stress combination disaster. Trends Plant Sci.26(6), 588–599 (2021). [DOI] [PubMed] [Google Scholar]
2.Bhardwaj, D. A. Impact of global warming. Int. J. Recent Trends Multidiscip. Res. 10–14 (2024).
3.Janni, M., Maestri, E., Gullì, M., Marmiroli, M. & Marmiroli, N. Plant responses to climate change, how global warming may impact on food security: a critical review. Front. Plant Sci.14, 1297569 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Madakumbura, G. D., Thackeray, C. W., Norris, J., Goldenson, N. & Hall, A. Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets. Nat. Commun.12(1), 3944 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kara, E. & Diken, A. Climatic change: The effect of rainfall on economic growth. Süleyman Demirel Üniversitesi Vizyoner Dergisi11(28), 665–679 (2020). [Google Scholar]
6.Guan, X. et al. Study on spatiotemporal distribution characteristics of flood and drought disaster impacts on agriculture in China. Int. J. Disaster Risk Reduct.64, 102504 (2021). [Google Scholar]
7.Schneider, U. et al. The new portfolio of global precipitation data products of the global precipitation climatology centre suitable to assess and quantify the global water cycle and resources. Proc. Int. Assoc. Hydrol. Sci.374, 29–34 (2016). [Google Scholar]
8.Fung, K. F. et al. Evaluation of spatial interpolation methods and spatiotemporal modeling of rainfall distribution in peninsular Malaysia. Ain Shams Eng. J.13(2), 101571 (2022). [Google Scholar]
9.Liu, Y., Zhuo, L., Pregnolato, M. & Han, D. An assessment of statistical interpolation methods suited for gridded rainfall datasets. Int. J. Climatol.42(5), 2754–2772 (2022). [Google Scholar]
10.Kumari, M., Singh, C. K., Bakimchandra, O. & Basistha, A. Dem-based delineation for improving geostatistical interpolation of rainfall in mountainous region of central Himalayas, India. Theor. Appl. Climatol.130(1), 51–58 (2017). [Google Scholar]
11.Li, X., Jia, H. & Wang, L. Remote sensing monitoring of drought in Southwest China using random forest and extreme gradient boosting methods. Remote Sens.15(19), 4840 (2023). [Google Scholar]
12.Liu, C.-C., Lin, T.-C., Yuan, K.-Y. & Chiueh, P.-T. Spatio-temporal prediction and factor identification of urban air quality using support vector machine. Urban Clim.41, 101055 (2022). [Google Scholar]
13.Gu, Z., Zhu, T., Jiao, X., Xu, J. & Qi, Z. Neural network soil moisture model for irrigation scheduling. Comput. Electron. Agric.180, 105801 (2021). [Google Scholar]
14.Belgiu, M. & Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote. Sens.114, 24–31 (2016). [Google Scholar]
15.Desai, S. & Ouarda, T. B. Regional hydrological frequency analysis at ungauged sites with random forest regression. J. Hydrol.594, 125861 (2021). [Google Scholar]
16.Uddin, M. J., Li, Y., Tamim, M. Y., Miah, M. B. & Ahmed, S. S. Extreme rainfall indices prediction with atmospheric parameters and ocean-atmospheric teleconnections using a random forest model. J. Appl. Meteorol. Climatol.61(6), 651–667 (2022). [Google Scholar]
17.Montenegro, M., Célleri, R., Orellana-Alvear, J., Munoz, P. & Cordova, M. Precipitation forecasting using random forest over an ecuadorian andes basin. Meteorol. Atmos. Phys.137(1), 5 (2025). [Google Scholar]
18.Koch, J. et al. Modeling depth of the redox interface at high resolution at national scale using random forest and residual gaussian simulation. Water Resour. Res.55(2), 1451–1469 (2019). [Google Scholar]
19.Wang, L., Wu, W. & Liu, H.-B. Digital mapping of topsoil ph by random forest with residual kriging (rfrk) in a hilly region. Soil Res.57(4), 387–396 (2019). [Google Scholar]
20.Ho, V. H., Morita, H., Bachofer, F., & Ho, T. H. Random forest regression kriging modeling for soil organic carbon density estimation using multi-source environmental data in central Vietnamese forests. Model. Earth Syst. Environ. 1–22 (2024).
21.Odeha, I., McBratney, A. & Chittleborough, D. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma63(3–4), 197–214 (1994). [Google Scholar]
22.Odeh, I. O., McBratney, A. & Chittleborough, D. Further results on prediction of soil properties from terrain attributes: Heterotopic cokriging and regression-kriging. Geoderma67(3–4), 215–226 (1995). [Google Scholar]
23.Hengl, T., Heuvelink, G. B. & Rossiter, D. G. About regression-kriging: From equations to case studies. Comput. Geosci.33(10), 1301–1315 (2007). [Google Scholar]
24.Otaviano, J. C. R. & de Almeida, C. F. P. Accessing the spatial distribution of aboveground biomass in tropical mountain forests using regression kriging simulation: a geostatistical approach for local-scale estimates. Ecol. Process.14(1), 44 (2025). [Google Scholar]
25.Gia Pham, T., Kappas, M., Van Huynh, C. & Hoang Khanh Nguyen, L. Application of ordinary kriging and regression kriging method for soil properties mapping in hilly region of central Vietnam. ISPRS Int. J. Geo-Inform.8(3), 147 (2019). [Google Scholar]
26.Breiman, L. Random forests. Mach. Learn.45(1), 5–32 (2001). [Google Scholar]
27.Wang, M., Fan, C., Gao, B., Ren, Z. & Li, F. A spatial random forest interpolation method with semi-variogram. Chin. J. Eco-Agric.30(3), 451–457 (2022). [Google Scholar]
28.Baratto, P. F. B., Cecílio, R. A., de Sousa Teixeira, D. B., Zanetti, S. S. & Xavier, A. C. Random forest for spatialization of daily evapotranspiration (et0) in watersheds in the atlantic forest. Environ. Monit. Assess.194(6), 449 (2022). [DOI] [PubMed] [Google Scholar]
29.Nwaila, G. T., Zhang, S. E., Bourdeau, J. E., Frimmel, H. E. & Ghorbani, Y. Spatial interpolation using machine learning: From patterns and regularities to block models. Nat. Resour. Res.33(1), 129–161 (2024). [Google Scholar]
30.Peckov, A. A machine learning approach to polynomial regression. Ljubljana Slovenia (2012). http://kt.ijs.si/theses/phd_aleksandar_peckov.pdf
31.Dikbaş, F. Forecasting extreme precipitations by using polynomial regression. IDŐJÁRÁS Q. J. Hungarian Meteorol. Serv.128(3), 379–398 (2024). [Google Scholar]
32.Kim, S., Kim, S., Green, C. H. & Jeong, J. Multivariate polynomial regression modeling of total dissolved-solids in rangeland stormwater runoff in the Colorado river basin. Environ. Model. Softw.157, 105523 (2022). [Google Scholar]
33.Alrefaie, A. M., Abdul-Samad, Z., Salleh, H., Alashwal, A. M. & Amos, D. Modelling labour productivity of reinforcement bar using polynomial regression: A study on a tropical country’s weather. Int. J. Constr. Manag.23(10), 1633–1641 (2023). [Google Scholar]
34.Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn.20(3), 273–297 (1995). [Google Scholar]
35.Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory13(1), 21–27 (1967). [Google Scholar]
36.Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol.67(2), 301–320 (2005). [Google Scholar]
37.Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodol.)36(2), 111–133 (1974). [Google Scholar]
38.Achite, M., Tsangaratos, P., Pellicone, G., Mohammadi, B. & Caloiero, T. Application of multiple spatial interpolation approaches to annual rainfall data in the Wadi Cheliff basin (North Algeria). Ain Shams Eng. J.15(3), 102578 (2024). [Google Scholar]
39.Helmi, A. M., Elgamal, M., Farouk, M. I., Abdelhamed, M. S. & Essawy, B. T. Evaluation of geospatial interpolation techniques for enhancing spatiotemporal rainfall distribution and filling data gaps in Asir region, Saudi Arabia. Sustainability15(18), 14028 (2023). [Google Scholar]
40.Ahmad, S., Batool, A. & Ali, Z. Spatial predictive analysis of drought duration in relation to climate change using interpolation techniques. Stoch. Environ. Res. Risk Assess.39(2), 639–656 (2025). [Google Scholar]
41.Ahmed, W. et al. Novel mlr-rf-based geospatial techniques: A comparison with ok. ISPRS Int. J. Geo Inf.11(7), 371 (2022). [Google Scholar]
42.Li, A. et al. Spatial interpolation of cropland soil bulk density by increasing soil samples with filled missing values. Sci. Rep.15(1), 8008 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Jalili Pirani, F. & Modarres, R. Geostatistical and deterministic methods for rainfall interpolation in the Zayandeh Rud basin, Iran. Hydrol. Sci. J.65(16), 2678–2692 (2020). [Google Scholar]
44.Darwich, A., Aprile, F. & Siqueira, G. Rainfall distribution in the brazilian amazon: Application of the variogram function to time series. J. Geograp. Environ. Earth Sci. Int.28(6), 47–66 (2024). [Google Scholar]
45.Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M. & Bajat, B. Random forest spatial interpolation. Remote Sens.12(10), 1687 (2020). [Google Scholar]
46.Sekulić, A., Kilibarda, M., Protić, D. & Bajat, B. A high-resolution daily gridded meteorological dataset for Serbia made by random forest spatial interpolation. Sci. Data8(1), 123 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Ma, C. et al. Prediction of summer precipitation via machine learning with key climate variables: A case study in Xinjiang, China. J. Hydrol. Regional Stud.56, 101964 (2024). [Google Scholar]
48.Baig, F., Ali, L., Faiz, M. A., Chen, H. & Sherif, M. How accurate are the machine learning models in improving monthly rainfall prediction in hyper arid environment?. J. Hydrol.633, 131040 (2024). [Google Scholar]
49.Chaka, L., Abd Elbasit, M. A., & Jombo, S. Predicting precipitation using dynamic distributed lag models in arid and sub-humid regions of South Africa. Sci. Afr. e02924 (2025).
50.Wolfensberger, D., Gabella, M., Boscacci, M., Germann, U. & Berne, A. Rainforest: A random forest algorithm for quantitative precipitation estimation over Swizerland. Atmos. Meas. Tech. Discuss.2020, 1–35 (2020). [Google Scholar]
51.Chen, C., Hu, B. & Li, Y. Easy-to-use spatial random-forest-based downscaling-calibration method for producing precipitation data with high resolution and high accuracy. Hydrol. Earth Syst. Sci.25(11), 5667–5682 (2021). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information 1.^{(663.6KB, pdf)}

Supplementary Information 2.^{(1.6MB, zip)}

Data Availability Statement

[CR1] 1.Zandalinas, S. I., Fritschi, F. B. & Mittler, R. Global warming, climate change, and environmental pollution: recipe for a multifactorial stress combination disaster. Trends Plant Sci.26(6), 588–599 (2021). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Bhardwaj, D. A. Impact of global warming. Int. J. Recent Trends Multidiscip. Res. 10–14 (2024).

[CR3] 3.Janni, M., Maestri, E., Gullì, M., Marmiroli, M. & Marmiroli, N. Plant responses to climate change, how global warming may impact on food security: a critical review. Front. Plant Sci.14, 1297569 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Madakumbura, G. D., Thackeray, C. W., Norris, J., Goldenson, N. & Hall, A. Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets. Nat. Commun.12(1), 3944 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Kara, E. & Diken, A. Climatic change: The effect of rainfall on economic growth. Süleyman Demirel Üniversitesi Vizyoner Dergisi11(28), 665–679 (2020). [Google Scholar]

[CR6] 6.Guan, X. et al. Study on spatiotemporal distribution characteristics of flood and drought disaster impacts on agriculture in China. Int. J. Disaster Risk Reduct.64, 102504 (2021). [Google Scholar]

[CR7] 7.Schneider, U. et al. The new portfolio of global precipitation data products of the global precipitation climatology centre suitable to assess and quantify the global water cycle and resources. Proc. Int. Assoc. Hydrol. Sci.374, 29–34 (2016). [Google Scholar]

[CR8] 8.Fung, K. F. et al. Evaluation of spatial interpolation methods and spatiotemporal modeling of rainfall distribution in peninsular Malaysia. Ain Shams Eng. J.13(2), 101571 (2022). [Google Scholar]

[CR9] 9.Liu, Y., Zhuo, L., Pregnolato, M. & Han, D. An assessment of statistical interpolation methods suited for gridded rainfall datasets. Int. J. Climatol.42(5), 2754–2772 (2022). [Google Scholar]

[CR10] 10.Kumari, M., Singh, C. K., Bakimchandra, O. & Basistha, A. Dem-based delineation for improving geostatistical interpolation of rainfall in mountainous region of central Himalayas, India. Theor. Appl. Climatol.130(1), 51–58 (2017). [Google Scholar]

[CR11] 11.Li, X., Jia, H. & Wang, L. Remote sensing monitoring of drought in Southwest China using random forest and extreme gradient boosting methods. Remote Sens.15(19), 4840 (2023). [Google Scholar]

[CR12] 12.Liu, C.-C., Lin, T.-C., Yuan, K.-Y. & Chiueh, P.-T. Spatio-temporal prediction and factor identification of urban air quality using support vector machine. Urban Clim.41, 101055 (2022). [Google Scholar]

[CR13] 13.Gu, Z., Zhu, T., Jiao, X., Xu, J. & Qi, Z. Neural network soil moisture model for irrigation scheduling. Comput. Electron. Agric.180, 105801 (2021). [Google Scholar]

[CR14] 14.Belgiu, M. & Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote. Sens.114, 24–31 (2016). [Google Scholar]

[CR15] 15.Desai, S. & Ouarda, T. B. Regional hydrological frequency analysis at ungauged sites with random forest regression. J. Hydrol.594, 125861 (2021). [Google Scholar]

[CR16] 16.Uddin, M. J., Li, Y., Tamim, M. Y., Miah, M. B. & Ahmed, S. S. Extreme rainfall indices prediction with atmospheric parameters and ocean-atmospheric teleconnections using a random forest model. J. Appl. Meteorol. Climatol.61(6), 651–667 (2022). [Google Scholar]

[CR17] 17.Montenegro, M., Célleri, R., Orellana-Alvear, J., Munoz, P. & Cordova, M. Precipitation forecasting using random forest over an ecuadorian andes basin. Meteorol. Atmos. Phys.137(1), 5 (2025). [Google Scholar]

[CR18] 18.Koch, J. et al. Modeling depth of the redox interface at high resolution at national scale using random forest and residual gaussian simulation. Water Resour. Res.55(2), 1451–1469 (2019). [Google Scholar]

[CR19] 19.Wang, L., Wu, W. & Liu, H.-B. Digital mapping of topsoil ph by random forest with residual kriging (rfrk) in a hilly region. Soil Res.57(4), 387–396 (2019). [Google Scholar]

[CR20] 20.Ho, V. H., Morita, H., Bachofer, F., & Ho, T. H. Random forest regression kriging modeling for soil organic carbon density estimation using multi-source environmental data in central Vietnamese forests. Model. Earth Syst. Environ. 1–22 (2024).

[CR21] 21.Odeha, I., McBratney, A. & Chittleborough, D. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma63(3–4), 197–214 (1994). [Google Scholar]

[CR22] 22.Odeh, I. O., McBratney, A. & Chittleborough, D. Further results on prediction of soil properties from terrain attributes: Heterotopic cokriging and regression-kriging. Geoderma67(3–4), 215–226 (1995). [Google Scholar]

[CR23] 23.Hengl, T., Heuvelink, G. B. & Rossiter, D. G. About regression-kriging: From equations to case studies. Comput. Geosci.33(10), 1301–1315 (2007). [Google Scholar]

[CR24] 24.Otaviano, J. C. R. & de Almeida, C. F. P. Accessing the spatial distribution of aboveground biomass in tropical mountain forests using regression kriging simulation: a geostatistical approach for local-scale estimates. Ecol. Process.14(1), 44 (2025). [Google Scholar]

[CR25] 25.Gia Pham, T., Kappas, M., Van Huynh, C. & Hoang Khanh Nguyen, L. Application of ordinary kriging and regression kriging method for soil properties mapping in hilly region of central Vietnam. ISPRS Int. J. Geo-Inform.8(3), 147 (2019). [Google Scholar]

[CR26] 26.Breiman, L. Random forests. Mach. Learn.45(1), 5–32 (2001). [Google Scholar]

[CR27] 27.Wang, M., Fan, C., Gao, B., Ren, Z. & Li, F. A spatial random forest interpolation method with semi-variogram. Chin. J. Eco-Agric.30(3), 451–457 (2022). [Google Scholar]

[CR28] 28.Baratto, P. F. B., Cecílio, R. A., de Sousa Teixeira, D. B., Zanetti, S. S. & Xavier, A. C. Random forest for spatialization of daily evapotranspiration (et0) in watersheds in the atlantic forest. Environ. Monit. Assess.194(6), 449 (2022). [DOI] [PubMed] [Google Scholar]

[CR29] 29.Nwaila, G. T., Zhang, S. E., Bourdeau, J. E., Frimmel, H. E. & Ghorbani, Y. Spatial interpolation using machine learning: From patterns and regularities to block models. Nat. Resour. Res.33(1), 129–161 (2024). [Google Scholar]

[CR30] 30.Peckov, A. A machine learning approach to polynomial regression. Ljubljana Slovenia (2012). http://kt.ijs.si/theses/phd_aleksandar_peckov.pdf

[CR31] 31.Dikbaş, F. Forecasting extreme precipitations by using polynomial regression. IDŐJÁRÁS Q. J. Hungarian Meteorol. Serv.128(3), 379–398 (2024). [Google Scholar]

[CR32] 32.Kim, S., Kim, S., Green, C. H. & Jeong, J. Multivariate polynomial regression modeling of total dissolved-solids in rangeland stormwater runoff in the Colorado river basin. Environ. Model. Softw.157, 105523 (2022). [Google Scholar]

[CR33] 33.Alrefaie, A. M., Abdul-Samad, Z., Salleh, H., Alashwal, A. M. & Amos, D. Modelling labour productivity of reinforcement bar using polynomial regression: A study on a tropical country’s weather. Int. J. Constr. Manag.23(10), 1633–1641 (2023). [Google Scholar]

[CR34] 34.Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn.20(3), 273–297 (1995). [Google Scholar]

[CR35] 35.Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory13(1), 21–27 (1967). [Google Scholar]

[CR36] 36.Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol.67(2), 301–320 (2005). [Google Scholar]

[CR37] 37.Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodol.)36(2), 111–133 (1974). [Google Scholar]

[CR38] 38.Achite, M., Tsangaratos, P., Pellicone, G., Mohammadi, B. & Caloiero, T. Application of multiple spatial interpolation approaches to annual rainfall data in the Wadi Cheliff basin (North Algeria). Ain Shams Eng. J.15(3), 102578 (2024). [Google Scholar]

[CR39] 39.Helmi, A. M., Elgamal, M., Farouk, M. I., Abdelhamed, M. S. & Essawy, B. T. Evaluation of geospatial interpolation techniques for enhancing spatiotemporal rainfall distribution and filling data gaps in Asir region, Saudi Arabia. Sustainability15(18), 14028 (2023). [Google Scholar]

[CR40] 40.Ahmad, S., Batool, A. & Ali, Z. Spatial predictive analysis of drought duration in relation to climate change using interpolation techniques. Stoch. Environ. Res. Risk Assess.39(2), 639–656 (2025). [Google Scholar]

[CR41] 41.Ahmed, W. et al. Novel mlr-rf-based geospatial techniques: A comparison with ok. ISPRS Int. J. Geo Inf.11(7), 371 (2022). [Google Scholar]

[CR42] 42.Li, A. et al. Spatial interpolation of cropland soil bulk density by increasing soil samples with filled missing values. Sci. Rep.15(1), 8008 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Jalili Pirani, F. & Modarres, R. Geostatistical and deterministic methods for rainfall interpolation in the Zayandeh Rud basin, Iran. Hydrol. Sci. J.65(16), 2678–2692 (2020). [Google Scholar]

[CR44] 44.Darwich, A., Aprile, F. & Siqueira, G. Rainfall distribution in the brazilian amazon: Application of the variogram function to time series. J. Geograp. Environ. Earth Sci. Int.28(6), 47–66 (2024). [Google Scholar]

[CR45] 45.Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M. & Bajat, B. Random forest spatial interpolation. Remote Sens.12(10), 1687 (2020). [Google Scholar]

[CR46] 46.Sekulić, A., Kilibarda, M., Protić, D. & Bajat, B. A high-resolution daily gridded meteorological dataset for Serbia made by random forest spatial interpolation. Sci. Data8(1), 123 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Ma, C. et al. Prediction of summer precipitation via machine learning with key climate variables: A case study in Xinjiang, China. J. Hydrol. Regional Stud.56, 101964 (2024). [Google Scholar]

[CR48] 48.Baig, F., Ali, L., Faiz, M. A., Chen, H. & Sherif, M. How accurate are the machine learning models in improving monthly rainfall prediction in hyper arid environment?. J. Hydrol.633, 131040 (2024). [Google Scholar]

[CR49] 49.Chaka, L., Abd Elbasit, M. A., & Jombo, S. Predicting precipitation using dynamic distributed lag models in arid and sub-humid regions of South Africa. Sci. Afr. e02924 (2025).

[CR50] 50.Wolfensberger, D., Gabella, M., Boscacci, M., Germann, U. & Berne, A. Rainforest: A random forest algorithm for quantitative precipitation estimation over Swizerland. Atmos. Meas. Tech. Discuss.2020, 1–35 (2020). [Google Scholar]

[CR51] 51.Chen, C., Hu, B. & Li, Y. Easy-to-use spatial random-forest-based downscaling-calibration method for producing precipitation data with high resolution and high accuracy. Hydrol. Earth Syst. Sci.25(11), 5667–5682 (2021). [Google Scholar]

PERMALINK

Integrating random forest-based regression kriging for analyzing spatial variability of rainfall in arid and semi-arid regions

Marwa Manaf

Zulfiqar Ali

Miklas Scholz

Abstract

Supplementary Information

Introduction

Materials and methods

Precipitation data and processing

Fig. 1.

Table 1.

Regression kriging

Machine learning models

Random forest

Polynomial regression

Support vector machine

K-nearest neighbors

Neural network

Elastic net

Comparative assessment and measures

Leave-one-out cross validation

Evaluation metrics

Results

Assessment of appropriate variogram model

Spatial variability of rainfall across decades

Table 2.

Fig. 2.

Comparative evaluation of variogram models via LOOCV

Table 3.

Table 4.

Assessment of machine learning-integrated regression kriging performance

Fig. 3.

Discussion

Conclusion

Supplementary Information

Abbreviations

Author contributions

Funding

Data availability

Code availability

Declarations

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases