Abstract
Soil moisture (SM) is a critical variable influencing various environmental processes, but traditional microwave sensors often lack the spatial resolution needed for local-scale studies. This study develops a novel stacking ensemble learning framework to enhance the spatial resolution of satellite-derived SM data to 1 km in the Urmia basin, a region facing significant water scarcity. We integrated in-situ SM measurements (obtained using time-domain reflectometry [TDR]), Soil Moisture Active Passive (SMAP) and Advanced Microwave Scanning Radiometer 2 (AMSR2) SM products, Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature and vegetation indices, precipitation records, and topography data. Ten base machine-learning models were evaluated using the Complex Proportional Assessment (COPRAS) method, and the top-performing models were selected as base learners for the stacking ensemble. The ensemble model, incorporating Random Forest, Gradient Boosting, and XGBoost, significantly improved SM estimation accuracy and resolution compared to individual models. The XGBoost and Gradient Boosting meta-models achieved the highest accuracy, with an unbiased root mean square error (ubRMSE) of 1.23% m3/m3 and a coefficient of determination (R2) of 0.97 during testing, demonstrating the exceptional predictive capabilities of our approach. SHapley Additive exPlanations (SHAP) analysis revealed the influence of each base model on the ensemble’s predictions, highlighting the synergistic benefits of combining diverse models. This study establishes new benchmarks for soil moisture monitoring by showcasing the potential of ensemble learning to improve the spatial resolution and accuracy of satellite-derived SM data, providing crucial insights for environmental science and agricultural planning, particularly in water-stressed regions.
Keywords: Soil moisture, Remote sensing, Ensemble learning, Time domain reflectometry (TDR)
Subject terms: Environmental sciences, Hydrology
Introduction
Soil moisture (SM), representing the water content within the unsaturated soil layer1,2, plays a crucial role in mediating interactions between the land and atmosphere. It acts as a driving force behind terrestrial ecosystems’ hydrological and energy cycles3,4, influencing critical processes like evapotranspiration, infiltration, and runoff, which in turn affect weather patterns, plant growth, and soil chemistry5–7. The significance of SM extends to its impact on agricultural productivity and the effective management of water resources, underscoring its importance across diverse ecological and agricultural landscapes3,4. Furthermore, SM’s control over infiltration and runoff distribution directly influences weather prediction, drought monitoring, and water resource management strategies, highlighting its crucial role in ecological, agricultural, and hydrological studies, as well as in mitigating the impacts of climate change8–11. Accurate and comprehensive spatiotemporal SM data is vital for advancing our understanding of terrestrial systems and ensuring the sustainable management of water resources12,13. However, effectively capturing SM’s spatial and temporal variations across diverse landscapes remains a significant challenge due to limitations in current measurement technologies and methodologies14,15. Therefore, advancements in measurement techniques are essential to deepening our understanding of SM dynamics and improving our capacity to manage natural resources and adapt to changing environmental conditions.
The transition from traditional in-situ techniques like time-domain reflectometry (TDR)15 to advanced remote sensing technologies has significantly broadened our understanding of SM dynamics on a global scale16,17. While in-situ methods offer precision, their limitations in cost, labor, and spatial coverage5,14 have paved the way for the adoption of microwave remote sensing. Satellites such as SMOS, AMSR-E, and SMAP, utilizing L-band frequencies (1–2 GHz), have emerged as powerful tools for SM monitoring due to their ability to penetrate vegetation and cloud cover, enabling consistent observations under various environmental conditions18–20. However, these satellite-based products, despite their advantages, typically have coarse spatial resolutions (9–50 km), limiting their applicability for local-scale hydrological and agricultural studies21. For instance, the SMAP mission, designed to provide global SM maps, offers a spatial resolution of 9 km for its L4 soil moisture products, which, while valuable for large-scale monitoring, may not capture the fine-scale variability crucial for many applications18,22,23. This inherent limitation in spatial resolution underscores the need for downscaling techniques to bridge the gap between the broad coverage of satellite observations and the detailed information required for local-level decision-making in fields like agriculture, hydrology, and ecology24,25.
To bridge the gap between the broad coverage of satellite observations and the detailed information needed for local-level applications, researchers have explored various downscaling techniques26. These methods aim to enhance the spatial resolution of satellite-derived SM data by integrating high-resolution ancillary information, such as vegetation indices, land surface temperature, and topography6. Several approaches have been employed, including empirical, semi-empirical, and physics-based methods27. Among these, machine learning (ML) techniques have gained prominence due to their ability to effectively capture complex non-linear relationships between SM and environmental factors28. ML algorithms, particularly the Random Forest (RF) model, have shown significant promise in downscaling SM data due to their flexibility and robustness in handling high-dimensional datasets with potential imbalances or missing features29,30. RF models, composed of multiple decision trees, excel in establishing relationships between SM and surface parameters even with limited data availability, leading to improved downscaling accuracy31,32. For example, a recent study by Ghafari et al. demonstrated the successful application of a random forest model for downscaling SMAP soil moisture to 1 km resolution using a combination of radar and vegetation data, highlighting the effectiveness of ML approaches in addressing the challenges of scale mismatch33. Other ML approaches, such as K-Nearest Neighbors and Support Vector Regression, have also been explored, with varying degrees of success depending on the specific study area and data characteristics34,35. Furthermore, integrating optical and thermal observations and geomorphological data as covariates in ML-based downscaling models has been widely utilized to improve prediction accuracy36,37. A comprehensive review by Senanayake et al. further emphasizes the advantages of ML-based downscaling methods in capturing complex relationships and improving the spatial resolution of SM data, while also noting the challenges in generalizing these models across diverse environments38. Despite these advancements, challenges remain in fully harnessing the potential of these methods, particularly in addressing spatial heterogeneity and temporal variability39. Further research is needed to develop robust and generalizable downscaling models that can be applied across diverse landscapes and environmental conditions. For instance, Zhong et al. proposed an innovative downscaling method that utilizes optical remote sensing data to improve the spatial resolution of passive microwave-derived SM products, which was validated against high-resolution airborne data, demonstrating significant improvements in capturing fine-scale SM variations40 . This method, along with others that integrate novel data sources or advanced machine learning techniques, holds promise for overcoming current limitations in SM downscaling.
While individual machine learning models have demonstrated their effectiveness in downscaling SM data, ensemble learning offers a promising avenue for further enhancing prediction accuracy and generalizability. Ensemble methods leverage the combined strengths of multiple models, mitigating individual algorithms’ limitations and achieving superior performance41. This approach has proven successful in various applications, including runoff forecasting42, SM retrieval43,44, earthquake casualty prediction45, wind power generation forecasting46, and flood susceptibility assessment47,48. These studies consistently demonstrate the superior performance of ensemble models compared to single-model approaches, highlighting their ability to capture complex relationships and improve prediction accuracy under diverse conditions. Among various ensemble techniques, stacking has emerged as a powerful method for integrating predictions from heterogeneous base models49. Stacking utilizes a meta-learner to combine the outputs of multiple base models, effectively leveraging their strengths and leading to more robust and accurate predictions50. For example, stacking frameworks have been successfully employed for downscaling SMAP SM data and retrieving soil moisture content in arid zones, showcasing improved accuracy and stability over individual machine learning algorithms43,51.
Building upon the advancements in downscaling techniques and the demonstrated potential of ensemble learning, this study aims to develop a robust and innovative framework for enhancing the spatial resolution and predictive accuracy of satellite-derived soil moisture (SM) data. By integrating diverse and high-performing machine learning models within a stacking ensemble approach, we seek to address the limitations of existing methods and achieve superior performance in SM downscaling. The importance of this work lies in its potential to bridge the gap between the broad coverage of satellite observations and the detailed information required for local-level decision-making, particularly in water-stressed regions like the Urmia sub-basin. This study’s findings will provide critical insights for improving water resource management and agricultural planning, contributing to sustainable environmental practices in regions facing significant ecological and climatic challenges.
Specifically, this study has the following objectives
Develop and evaluate a stacking ensemble learning framework: We will implement a stacking ensemble model that combines the predictions of multiple base models, including Random Forest, Gradient Boosting, and XGBoost, to generate high-resolution SM maps with enhanced accuracy.
Assess the performance of individual and ensemble models: We will rigorously evaluate the performance of each base model and the stacking ensemble using a comprehensive set of metrics, including R-squared, RMSE, ubRMSE, and bias. This will allow us to identify the most effective model combination for SM prediction and quantify the improvement achieved through ensemble learning.
Analyze the contribution of base models using SHAP: We will employ SHapley Additive exPlanations (SHAP) values to analyze the contribution of each base model to the final prediction of the meta-models within the stacking ensemble framework. This will provide insights into the interaction and relative importance of different base models in enhancing the ensemble’s predictive power.
Demonstrate applicability in the Urmia sub-basin: We will apply the developed framework to the Urmia sub-basin, a region facing significant water scarcity and environmental challenges. This case study will showcase the practical application of our methodology and its potential for improving water resource management and agricultural planning in water-stressed regions.
Study area and datasets
Study area
This study focuses on the Miandoab region within the Urmia Lake sub-basin in northwest Iran (Fig. 1). Covering approximately 52,000 km2 between 36°N–38°N and 44°E–46°E, the area features diverse topography that influences its hydrology and soil moisture dynamics. The region has a semi-arid climate with hot summers, cold winters, and annual precipitation ranging from 300 to 500mm52,53. The Miandoab region features diverse land use, including agricultural lands (mainly wheat and barley), natural vegetation, and urban settlements. This complex landscape, along with its climate and topography, leads to significant variability in soil moisture—crucial for agricultural planning and water management. Given the limited ground-based measurements, the application of SMAP satellite data is invaluable. Accurately downscaling this data enhances our understanding of soil moisture distribution across different land cover types in Miandoab54. The Urmia Lake sub-basin, particularly the Miandoab region, has been the subject of numerous studies related to water resources and agricultural practices55,56. Previous studies in the basin have focused on hydrological aspects like evapotranspiration patterns and agricultural water use; For instance, Jalilvand et al.54 compared evapotranspiration estimates using METRIC and WaPOR products. However, challenges in soil moisture estimation remain due to complex interactions between land surface characteristics and atmospheric conditions, with less emphasis placed on this aspect. This study aims to bridge this gap by leveraging advanced satellite soil moisture data and downscaling techniques to produce high-resolution soil moisture maps for the Urmia Lake sub-basin and Miandoab region. By addressing these challenges, we contribute to developing robust downscaling methods and offer valuable insights for sustainable water resource management and agricultural planning in this water-scarce area.
Fig. 1.
The study area (ArcGIS, 10.8.1).
Datasets
In situ soil moisture
Despite the critical importance of soil moisture, there are no dedicated measurement stations for this parameter in Iran’s Urmia Lake basin, a region of significant national importance. Consequently, this study employed a Time Domain Reflectometry (TDR) device to measure soil moisture. TDR is a widely used technique that measures the travel time of an electromagnetic pulse along a waveguide inserted into the soil. The travel time is influenced by the soil’s dielectric constant, which is directly related to its moisture content57,58. This method offers several advantages, including high accuracy, ease of use, and the ability to measure soil moisture at various depths59. In determining suitable sites for TDR based moisture measurement, the study considered multiple factors, including soil composition, topographical variation, vegetation coverage, and land utilization practices. The selection of specific locations within the designated research zone for assessing soil moisture content was meticulously planned. Initially, the Urmia basin was gridded based on the 9 km resolution of the SMAP L4 satellite grid. From this grid, we identified and selected pixels that were most accessible and could be efficiently covered along a practical route. This approach allowed us to determine how many pixels could be realistically sampled for in-situ data collection within a single day.
We chose 12 pixels that covered various land cover categories to achieve a thorough representation. Crucially, we assessed soil moisture in each pixel using a TDR device, specifically in dry, semi-arid, and agricultural regions. The TDR method, known for its high precision and user-friendly nature, was utilized to evaluate soil moisture at a depth of 5 cm using the TDR 350 instrument. Multiple measurements were collected for each of the three land cover classes in every pixel to improve accuracy and better capture the variability of soil moisture. This approach guaranteed that our data collection encompassed the complete spectrum of soil moisture fluctuation inside each pixel, accurately representing the varied terrain of the research area, as depicted in Fig. 2. Data collection occurred daily between September 20 and October 26, 2020. The measurements were taken during two sessions, one in the morning and one in the afternoon. The set of chosen pixels was covered, and adjustments were made to account for daily changes. Our objective was to enhance the accuracy of our data by calculating the average of numerous measurements for each land cover class. This approach allows us to properly represent the soil moisture conditions throughout the Urmia basin, including the arid, semi-arid, and agricultural areas.
Fig. 2.
Soil moisture sampling points; (a) Agricultral; (b) Rainfed Agriculture; and (c) Arid (ArcGIS 10.8.1).
SMAP radiometer soil moisture
This study utilized the Soil Moisture Active Passive (SMAP) Level-4 (L4) soil moisture data, specifically the SMAP L4 Global 3-hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data product. This product, renowned for its high accuracy and relevance in soil moisture studies, provides comprehensive global SM monitoring with a refresh cycle of 2–3 days18,60. The dataset integrates brightness temperatures from SMAP’s L-band radiometer, captured during both descending and ascending half-orbit passes, with a land surface model. The data is mapped onto a 9 km Equal-Area Scalable Earth Grid, Version 2.0 (EASE-Grid 2.0), offering valuable insights into SM dynamics61. Despite its advantages, the 9 km resolution poses limitations for localized applications. Therefore, this study employed the SMAP L4 data as a foundation for downscaling, aiming to enhance the spatial resolution and provide more detailed SM information crucial for water resource management and agricultural planning62–64. The data source details are presented in Table 1.
Table 1.
Data materials used in the study.
| Parameter | Data | Spatial/temporal resolution | Source |
|---|---|---|---|
| Elevation / Aspect | Alaska Satellite Facility | 12.5 m | https://search.asf.alaska.edu/#/ |
| Cilt_to_Sand_ratio | FAO | 1 km | https://www.fao.org/ |
| LST | MODIS | 250 m | https://earthexplorer.usgs.gov/ |
| NDVI | |||
| Soil moisture | SMAP | 9 km | https://earthexplorer.usgs.gov/ |
| AMSR2 | 10 km | https://earthexplorer.usgs.gov/ | |
| TDR soil moisture meter | point | In Situ Soil Moisture | |
| Precipitation | Synoptic Station | -/daily | https://www.irimo.ir |
NDVI (Normalized Difference Vegetation Index), LST (Land surface temperature), DEM (Digital Elevation Model)).
AMSR2 soil moisture data
The Advanced Microwave Scanning Radiometer 2 (AMSR2) on the GCOM-W1 satellite, launched in May 2012 by the Japan Aerospace Exploration Agency (JAXA), provides valuable data for our study. Utilizing the AMSR2/GCOM-W1 surface soil moisture (LPRM) L3 data, with a resolution of 10 km × 10 km, our research leverages both ascending and descending data sets65,66 These data sets include essential land surface parameters such as surface soil moisture, land surface temperature, and vegetation water content. Derived using the Land Parameter Retrieval Model (LPRM), the AMSR2 data is fundamental in remote sensing applications for passive microwave data analysis. The LPRM employs a forward radiative transfer model for retrieving surface soil moisture and vegetation optical depth, while land surface temperature is deduced separately from AMSR2’s Ka-band (36.5 GHz). A significant attribute of LPRM is its versatility across various microwave frequencies, making it ideal for exploiting passive microwave data from multiple satellite sources. This comprehensive approach enables in-depth analysis of surface soil moisture, which is a critical factor in understanding and managing climate and environmental change67,68 . Based on this data, "AMSR2/GCOM-W1 LPRM L3 1 day 10 km × 10 km Ascending V001 and descending V001" were used, and the average ascending and descending for each day was used for input to the models, Table 1 shows the source of this data.
MODIS products
Utilizing MODIS (Moderate Resolution Imaging Spectroradiometer) data, specifically the Normalized Difference Vegetation Index (NDVI) at a resolution of 250 m and Land Surface Temperature (LST), has been a pivotal aspect of this study. This approach aligns with numerous research efforts that have emphasized the importance of these parameters in the downscaling of satellite soil moisture data28. The data, sourced from the MODIS instrument aboard NASA’s Terra and Aqua satellites, provides a comprehensive spectral range essential for a wide array of land, ocean, and atmospheric studies. The data collection times were carefully synchronized with the Terra (10:30 am descending) and Aqua (1:30 pm ascending) satellites to complement the SMAP data used in the analysis.
The study involved the use of version-5 MODIS-Terra products, including MOD13Q1 for NDVI at 250 m resolution and MOD11A1 for daily LST at 1 km resolution, obtained from the NASA Land Processes Distributed Active Archive Center at the USGS Earth Resources Observation and Science Center (http://e4ftl01.cr.usgs.gov/MOLT/). The choice of MOD11A1 and MOD13Q1 datasets, encompassing daytime and nighttime LST measurements and atmospherically corrected NDVI data, respectively, was crucial for achieving accurate analysis results. The high temporal resolution of MODIS ensured frequent, cloud-free observations, which is crucial for monitoring global vegetation dynamics effectively. By leveraging NDVI and LST data from MODIS, this strategy significantly enhanced the precision of the soil moisture downscaling methodology, aligning with established research methodologies in the field.
Precipitation
Precipitation is a crucial driver of soil moisture variability. To account for its influence, this study incorporated daily precipitation data obtained from 22 synoptic ground stations across the Urmia Lake Basin (https://data.irimo.ir/). These stations provided comprehensive coverage of the study area (Fig. 1). To generate a spatially continuous precipitation surface, we employed the ordinary kriging interpolation method. Kriging was chosen due to its ability to estimate values at unsampled locations while considering the spatial autocorrelation of the data, leading to more accurate and reliable spatial predictions compared to other interpolation methods. Using ArcGIS software, we then extracted the interpolated daily precipitation rasters for the Miandoab and Maragheh sub-basins, ensuring the inclusion of precipitation data as a relevant feature in the downscaling process.
Topography
Topographical factors such as slope and aspect significantly influence the distribution of soil moisture, particularly in the topsoil layer, by affecting surface runoff, infiltration, and solar radiation exposure69. Numerous studies have highlighted the role of elevation data in improving the accuracy of large-scale soil moisture mapping70. Therefore, our downscaling approach incorporated elevation and aspect as crucial components. We derived these variables from a 12.5 m resolution Digital Elevation Model (DEM) obtained from the Alaska Satellite Facility (https://search.asf.alaska.edu/#/) (Table 1). Using ArcGIS Spatial Analyst tools, we calculated the slope and aspect rasters from the DEM. These rasters were then resampled to a 1 km resolution to match the resolution of other datasets used in the study.
Soil property
Soil texture plays a critical role in determining soil moisture characteristics. We obtained soil texture data, specifically the percentages of sand, silt, and clay, from the SoilGrids250m product provided by the International Soil Reference and Information Centre (ISRIC)—World Soil Information (https://www.isric.org/). The presence of clay and sand significantly influences soil properties such as water-holding capacity, permeability, and compaction, which directly correlate with soil moisture content. Given its importance, we calculated the clay/sand ratio from the soil texture data and included it as a valuable feature in our downscaling models. The processed soil property data was resampled to a 1 km resolution to align with the other datasets used in the study (Table 1).
Methodology
Figure 3 presents the schematic progression of the employed downscaling technique in this research, which consists of four main steps: data preparation, model training, model selection, and model ensemble. The data preparation step involves collecting and preprocessing various spatial layers related to terrain, climate, and satellite data. The model training step involves the development of ten primary machine learning models using the preprocessed spatial layers as input features. The model selection step employs the COPRAS method to evaluate and rank the models based on their performance on the training set. The best-performing models are refined through hyperparameter tuning and validated using a fivefold cross-validation approach. The model ensemble step uses a stacking ensemble learning method to combine the outputs of the selected models into meta-models, which aim to improve the predictive accuracy and robustness of the individual models. The ensemble’s performance is then assessed against ground-based measurements using a set of error metrics. The outcome of this multi-step process is a high-resolution 1 km soil moisture map, achieved through this layered and rigorous machine learning approach.
Fig. 3.
The flowchart of downscaling method proposed in this study.
Data preprocessing
The data preparation for soil moisture downscaling involved the collection and preprocessing of various spatial layers related to terrain, climate, and satellite data, such as digital elevation model (DEM), aspect, clay-to-sand ratio, SMAP, AMSR2, precipitation, NDVI, and LST. These datasets were resampled to a consistent 1 km resolution, subjected to a stringent quality control process, and split into training and testing sets with a 70:30 ratio. The input variables were then normalized using the MinMaxScaler to improve the performance of the machine learning algorithms. Ten primary machine learning models were trained using the normalized data and fine-tuned using Bayesian Optimization. The models were validated using a fivefold cross-validation approach and a set of error metrics. The outputs of the selected models were then combined using a stacking ensemble learning method to produce a high-resolution 1 km soil moisture map.
Machine learning algorithms
Downscaling, the process of enhancing the spatial resolution of satellite soil moisture data, can be tackled using various regression techniques21,41. We utilized the Python library scikit-learn to develop and evaluate ten primary machine learning models known for their effectiveness in regression tasks. This was achieved using various machine learning algorithms, which were tasked with learning the complex relationships between soil moisture and the auxiliary data at different resolutions. The ten primary machine learning models used include Random Forest71, XGBoost72, Gradient Boosting26, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Ridge Regression, Kernel Ridge Regression, and Bayesian Ridge Regression, AdaBoost Regressor71. These models were selected for their ability to capture non-linear relationships and interactions between the input features and the target soil moisture values.
Given the diverse strengths and potential limitations of these machine learning algorithms, a systematic approach was required to select the most suitable models as base learners for our stacking ensemble. Therefore, we employed the COPRAS (Complex Proportional Assessment) method to evaluate and rank the models based on their performance across multiple criteria, ensuring a balanced and effective ensemble for soil moisture downscaling.
COPRAS model for model selection
To select the most suitable machine learning algorithms as base learners for our stacking ensemble, we employed the COPRAS (Complex Proportional Assessment) method. COPRAS is a multi-criteria decision-making technique that allows for the consideration of multiple performance metrics and their relative importance through assigned weights73. This approach provides a structured and transparent framework for evaluating and ranking different algorithms based on their performance across various criteria74, ensuring a balanced and effective ensemble for soil moisture downscaling.
In this study, we considered the following key performance metrics as selection criteria:
Root mean square error (RMSE) and unbiased RMSE (ubRMSE): These metrics quantify the prediction accuracy of the models, with lower values indicating better performance.
Coefficient of determination (R2): R2 measures the proportion of variance in the observed data explained by the model, with higher values indicating better explanatory power.
We assigned weights to each criterion to reflect their relative importance in the context of soil moisture downscaling. Accuracy was prioritized, with RMSE and ubRMSE each receiving a weight of 25% due to their direct relevance in quantifying prediction errors. R2, representing the model’s explanatory power, was assigned a weight of 50%, emphasizing the importance of capturing the variance in the observed soil moisture data.
Hyperparameter tuning
Hyperparameter tuning is essential in refining machine learning models to achieve optimal performance. By employing Bayesian optimization, we effectively navigate the complex hyperparameter space. This method, recognized for its strategic efficiency in adjusting model parameters75, employs a probabilistic model to forecast the efficacy of different hyperparameter configurations. It systematically prioritizes those settings most likely to enhance model accuracy, ensuring a focused and efficient search.
A critical aspect of our tuning process is using fivefold cross-validation. This technique helps ensure that the hyperparameters selected are effective over the training data and exhibit strong generalization capabilities across unseen data, thereby minimizing the overfitting risk. The ultimate goal of this optimization process is to refine our models to produce reliable and accurate soil moisture predictions consistently. Bayesian optimization is widely supported as a powerful tool for hyperparameter tuning. These advantages include a more directed and potentially faster convergence on optimal hyperparameter values, contributing significantly to the precision and reliability of the predictive models developed in our study.
Stacking ensemble learning and Downscaling Process
Stacking ensemble learning, a multifaceted technique that uses a meta-learner to integrate predictions from heterogeneous base learners aims to optimize for reduced generalization errors and improved model robustness. It enhances forecast accuracy by leveraging the predictive capabilities of multiple machine-learning models, drawing from the foundational theories proposed by Wolpert76 and Breiman77.
The downscaling process in our study was constructed as follows:
This study operationalizes an organized, multi-step structure for downscaling satellite soil moisture data in order to increase spatial resolution from the course 9 km SMAP and 10 km AMSR2 data to 1 km. The steps involved are:
Data preparation: All input datasets, including SMAP, AMSR2, MODIS products (NDVI, LST), precipitation data, topography, and soil properties, are resampled to a spatial resolution of 1 km. This step is done to ensure that all auxiliary variables are aligned with the downscaling target resolution.
Model training: Ten primary machine-learning models encompassing Random Forest, XGBoost, and Gradient Boosting are trained using high-resolution auxiliary datasets to learn the relationships between soil moisture and environmental covariates (vegetation, temperature, precipitation, topography) at a 1 km resolution.
COPRAS model selection: The COPRAS technique is used for evaluating ten machine learning models for the identification of the most suitable base models for the downscaling process. Among those, the top six models are selected based on their performance on main key indicators like RMSE, R2, and ubRMSE.
Ensemble learning: Stacking ensemble learning is used subsequently to merge predictions from the best models built using such features. The meta-model uses the strengths of each model used and comes up with a more accurate and spatially refined 1 km soil moisture map.
Final downscaled prediction: The output of the base models within the stacking ensemble is combined to provide the final 1 km resolution soil moisture product. A better spatial detail is derived from the downscaled product compared with the original satellite products. A product with improved resolution and improved prediction compared and validated against in situ measurements.
This structured approach to downscaling was meticulously evaluated against metrics such as R2, RMSE, and ubRMSE, confirming its capability to outperform the individual base models and conventional single-model strategies significantly. The ensemble’s success in delivering more accurate and robust predictions underscores the advantage of integrating multiple learning algorithms for complex tasks, such as downscaling soil moisture content. Through this innovative methodology, our study aims to advance the precision and reliability of soil moisture monitoring, contributing valuable insights to the field.
Shapely additive explanations (SHAP)
To address the challenges of interpretability in feature importance within conventional machine learning (ML) models, especially in the context of downscaling soil moisture, our study integrates Shapley Additive explanations (SHAP). This method is pivotal for providing insights into the influence of input variables on soil moisture predictions. Originating from the foundational work by Lundberg and Lee78, SHAP offers a robust framework for assessing the contribution of each input variable to the predictions of a trained model, thereby facilitating an interpretative analysis that extends beyond conventional importance metrics79.
Given a set of input variables
], SHAP employs an auxiliary model
to determine the impact of each input variable on the model
. This auxiliary model
is mathematically defined as:
![]() |
1 |
where
represents the number of input variables,
indicates the binary state of an input variable (with
for features used in prediction and
for unused features), and
represents the contribution of each feature to the model. The formula for computing
for each feature is:
![]() |
2 |
This equation systematically calculates the additive contribution of each feature across all possible combinations, ensuring an equitable attribution of influence among them. By employing SHAP, this study illuminates the importance of individual features, including their directional impact on the soil moisture downscaling model. This approach enabled a thorough exploration of the decision-making process of the models, providing deep insights into the dynamic relationships between input features and the predicted soil moisture levels. Therefore, the incorporation of SHAP potentially reveals the fundamental mechanisms behind the model’s predictions, significantly enhancing the interpretability and applicability of the ML models in soil moisture downscaling. For the practical application of SHAP in this study, the Python 3.7 “shap” package was utilized.
Performance evaluation
To evaluate the effectiveness of our soil moisture downscaling model’s effectiveness, we compared in-situ observations and satellite-derived soil moisture products. The assessment was based on four traditional statistical metrics, specifically chosen for their relevance and reliability in environmental model evaluation: the coefficient of determination (R2), RMSE, ubRMSE, and bias18. These metrics offer insights into the model’s predictive accuracy, error magnitude, and systematic deviations from observed values. The calculations for these metrics are provided as follows:
![]() |
3 |
![]() |
4 |
![]() |
5 |
![]() |
6 |
In this study, actual SM values were denoted as
, and predicted SM values were indicated as
. The mean of the observed values was calculated as
, and the mean of the predicted values was noted as
. The total number of observations was represented by N. A spatial mismatch between in-situ points and satellite pixels was observed, possibly leading to potential inaccuracies in the RMSE and bias metrics. However, the R2 and ubRMSE metrics are believed to be less influenced by such discrepancies. Consequently, an emphasis was placed on these two measures in the subsequent analysis. Observations from stations located within the same satellite cell were averaged daily prior to analysis to mitigate the effects of spatial mismatch.
In this framework,
denotes the actual SM values,
represents the predicted SM values,
is the mean of observed values, and
signifies the mean of predicted values, with
indicating the total count of observations. Acknowledging the potential spatial mismatch between in-situ points and satellite pixels, which may affect the RMSE and bias metrics due to possible inaccuracies, our analysis places a significant emphasis on R2 and ubRMSE. These metrics are deemed less sensitive to spatial discrepancies. Observations from stations within the same satellite cell were aggregated into daily averages before analysis to mitigate the effects of spatial mismatch.
This evaluation methodology aims to provide a comprehensive assessment of the model’s performance, emphasizing its accuracy and the extent to which it produces unbiased soil moisture predictions. By carefully selecting and applying these metrics, we ensure a detailed examination of the model’s capabilities in the context of downscaling soil moisture content, acknowledging and addressing the challenges posed by spatial mismatches.
Results
Correlations between selected features
Before evaluating the performance of the downscaling models, we analyzed the correlations between the selected features to gain insights into the relationships between these variables and their potential influence on soil moisture prediction. Figure 4 presents a heatmap visualizing the correlation coefficients between each pair of features, including the target variable (TDR soil moisture). The heatmap reveals several noteworthy relationships. A strong positive correlation (r = 0.74) exists between SMAP soil moisture data and the clay-to-sand ratio, indicating that areas with a higher proportion of clay tend to exhibit higher soil moisture levels. This aligns with the known water-retention properties of clay-rich soils. Furthermore, a moderate positive correlation (r = 0.69) is observed between precipitation and land surface temperature (LST), suggesting a potential link between warmer conditions and increased precipitation events in the study area. The influence of topography on soil moisture distribution is evident in the positive correlation (r = 0.60) between slope and the Digital Elevation Model (DEM). This suggests that elevation gradients affect soil moisture patterns, likely by influencing runoff and infiltration processes. Conversely, the Normalized Difference Vegetation Index (NDVI) displays negative correlations with both DEM (r = -0.37) and the clay-to-sand ratio (r = -0.30). This implies that denser vegetation is typically found in areas with lower elevations and finer soil textures, potentially impacting moisture retention dynamics. These correlations provide valuable insights into the complex relationships between various environmental factors and soil moisture. Understanding these relationships is crucial for interpreting the model results and for developing effective downscaling strategies that account for the influence of these factors on soil moisture variability.
Fig. 4.
Heatmap showing the correlation coefficients among selected features and TDR soil moisture in the study area.
Evaluation of machine learning models using the COPRAS method
This study employed the Complex Proportional Assessment (COPRAS) method to systematically evaluate and rank multiple machine learning models based on their performance in downscaling satellite soil moisture data. Performance metrics, including the Root Mean Square Error (RMSE), unbiased Root Mean Square Error (ubRMSE), and the coefficient of determination (R2), were selected to assess the accuracy and predictive power of the models in capturing soil moisture variability at finer spatial resolutions. Given the complexities of soil moisture data, influenced by a myriad of land surface processes and spatial heterogeneities, the COPRAS evaluation was designed with a weighting scheme that prioritizes both high prediction accuracy (RMSE and ubRMSE) and the ability to explain variance in soil moisture data (R2). The weights were allocated 25% each for RMSE and ubRMSE and 50% for R2. This scheme highlights the most effective models for downscaling satellite soil moisture observations. The outcomes of the COPRAS evaluation are summarized in Table 2, which ranks the machine-learning models based on their normalized COPRAS scores.
Table 2.
Rankings of machine learning models for soil moisture downscaling using normalized COPRAS scores.
| Model | Normalized Score | Model | Normalized score |
|---|---|---|---|
| XGBoost | 1.00 | SVR | 0.52 |
| GradientBoosting | 0.93 | MLP | 0.48 |
| RandomForest | 0.92 | AdaBoost | 0.46 |
| KNeighbors | 0.66 | BayesianRidge | 0.44 |
| KernelRidge | 0.53 | Ridge | 0.44 |
The COPRAS method identified the XGBoost as the top-performing model with a normalized score of 1, indicating its superior predictive performance and explanatory power compared to the other models evaluated. The GradientBoosting and RandomForest models also showed excellent performance, with normalized scores of 0.93 and 0.92, respectively, underscoring their effectiveness in handling complex datasets. Models such as the KNeighbors and KernelRidge, while not matching the top performers, demonstrated respectable capabilities, as evidenced by their normalized scores. Conversely, traditional linear models like the Ridge and Bayesian Ridge ranked lower in this assessment, suggesting they may be less suited to the complexities of the dataset used in this study. Based on these evaluations, the six highest-scoring models were selected as base models for further analysis and application. These models include the XGBoost, GradientBoosting, RandomForest, KNeighbors, KernelRidge, and SVR. The selection of these models as base models reflects their potential to provide accurate and reliable downscaling of satellite soil moisture data, considering their varying strengths and the diverse nature of the data being modeled. The inclusion of both tree-based models (XGBoost et al.) and non-tree-based models (KNeighbors, KernelRidge, and SVR) ensures a broad representation of modeling approaches, catering to the multifaceted challenges presented by satellite soil moisture downscaling.
Hyperparameter tuning for base models
To optimize the performance of the selected base models—XGBoost, GradientBoosting, RandomForest, KNeighbors, KernelRidge, and SVR—for the task of downscaling satellite-derived soil moisture data, we employed the BayesSearchCV algorithm for Bayesian optimization. This approach allowed us to explore a wide array of hyperparameter configurations systematically, aiming to maximize each model’s predictive accuracy and robustness.
The tuning process led to significant improvements in model performance, demonstrating the effectiveness of our strategy. Table 3 summarizes the optimal hyperparameters identified for each model, along with the corresponding best performance scores achieved during model validation. XGBoost, Gradient Boosting, and Random Forest emerged as the top performers with scores of 0.94, 0.93, and 0.92, respectively, highlighting their superior accuracy in soil moisture downscaling. In comparison, K-Nearest Neighbors (KNeighbors), Kernel Ridge, and Support Vector Regression (SVR) exhibited lower performance with validation scores of 0.77, 0.64, and 0.60, respectively.
Table 3.
Optimal hyperparameters and corresponding performance scores for selected base models.
| Model | Best parameters | Best score |
|---|---|---|
| XGBoost | colsample_bytree: 1.0, gamma: 0.076, learning_rate: 0.011, max_depth: 23, n_estimators: 436, subsample: 0.711 | 0.94 |
| GradientBoosting | learning_rate: 0.181, max_depth: 17, max_features: ‘log2’, min_samples_leaf: 1, min_samples_split: 0.01, n_estimators: 500, subsample: 1.0 | 0.93 |
| RandomForest | max_depth: 27, min_samples_leaf: 1, min_samples_split: 2, n_estimators: 500 | 0.92 |
| KNeighbors | algorithm: ‘ball_tree’, leaf_size: 47, n_neighbors: 4, p: 1, weights: ‘distance’ | 0.77 |
| KernelRidge | alpha: 1.82e-05, coef0: 1.57, degree: 5, gamma: 0.083, kernel: ‘poly’ | 0.64 |
| SVR | C: 32.52, degree: 3, epsilon: 0.037, gamma: ‘scale’, kernel: ‘rbf’ | 0.60 |
Selected base modes performance
Upon a detailed evaluation using the COPRAS method, six machine-learning models were distinguished and selected as base models for their superior performance in downscaling satellite soil moisture data. These models, namely Support Vector Regression (SVR), Kernel Ridge, K-Nearest Neighbors (KNeighbors), Random Forest, Gradient Boosting, and XGBoost, were chosen based on their exemplary ability to predict soil moisture values accurately. This selection was substantiated by analyzing their RMSE, ubRMSE, R2, and Bias performance metrics. The comprehensive results of this evaluation, detailing the performance of these base machine learning models on both the validation and test sets, are meticulously summarized in Table 4, which presents a comparative analysis of these selected models. It highlights their respective RMSE, ubRMSE, R2, and Bias scores across the validation and test datasets. The analysis demonstrates that XGBoost outperforms the other models, showcasing the lowest RMSE and ubRMSE values alongside the highest R2 scores in both validation and test sets, indicative of its exceptional predictive accuracy and consistency. The GradientBoosting and RandomForest models are closely following in performance, which exhibit robust capabilities in modeling the complex dynamics of soil moisture. While models like SVR and KernelRidge are ranked lower based on these criteria, they nonetheless offer competent performances, especially regarding RMSE and R2 scores. This suggests their applicability in specific downscaling scenarios where interpretability and computational efficiency are paramount.
Table 4.
Comparative performance evaluation of base machine learning models against in-situ soil moisture measurements in validation and test sets.
| Base Models | Validation Set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|
| RMSE | ubRMSE | R2 | Bias | RMSE | ubRMSE | R2 | Bias | |
| SVR | 4.93 | 4.87 | 0.60 | -0.70 | 4.64 | 4.60 | 0.64 | -0.61 |
| KernelRidge | 4.68 | 4.67 | 0.64 | -0.01 | 4.55 | 4.55 | 0.65 | 0.05 |
| KNeighbors | 3.76 | 3.76 | 0.77 | 0.13 | 3.47 | 3.47 | 0.80 | 0.16 |
| RandomForest | 2.22 | 2.22 | 0.92 | 0.04 | 1.78 | 1.78 | 0.95 | -0.01 |
| GradientBoosting | 2.11 | 2.10 | 0.93 | 0.01 | 1.70 | 1.70 | 0.95 | 0.01 |
| XGBoost | 1.92 | 1.92 | 0.94 | 0.00 | 1.48 | 1.48 | 0.96 | 0.00 |
Significant values are in bold.
As detailed in Table 4, this evaluation underlines the significance of leveraging diverse models to tackle the intricate challenges associated with downscaling satellite soil moisture data. This research aims to significantly improve the precision and reliability of soil moisture estimations by adopting these selected base models, each with unique strengths and modeling capabilities. Such enhancements are critical for advancing hydrological modeling and optimizing water resource management strategies, contributing to a more nuanced understanding of terrestrial water cycles.
Figure 5 presents the violin plots for the performance metrics (a) R2, (b) ubRMSE, (c) RMSE, and (d) Bias offering a visual comparison of the base models’ effectiveness in downscaling satellite soil moisture data against in-situ measurements. These plots synthesize the metrics’ distribution and density, revealing the predictive power (R2), accuracy (RMSE and ubRMSE), and systematic error (Bias) of each model. The R2 plots underscore models with superior explanatory capabilities, while the RMSE and ubRMSE plots emphasize those with higher accuracy, and the Bias plots alert to potential systematic deviations. Notably, models such as XGBoost, GradientBoosting, and RandomForest exhibit high R2 values with minimal errors and Bias, positioning them as robust contenders for precise soil moisture estimation. In contrast, the broader distributions for SVR and KernelRidge in the Bias plot highlight the necessity for careful calibration to mitigate systematic errors, reaffirming the importance of tailored model selection based on specific performance criteria.
Fig. 5.
Violin plots for performance metrics comparing base models against in-situ soil moisture measurements; (a) R2; (b) ubRMSE; (c) RMSE; and (d) Bias.
We evaluated the performance of six base models in downscaling satellite-derived soil moisture data by comparing their predictions with in-situ measurements obtained using Time Domain Reflectometry (TDR). Figure 6 presents scatter plots depicting each model’s predicted soil moisture percentages against the measured values. The plots also include linear fit lines, R2, ubRMSE, and Bias values. These visualizations effectively highlight each model’s predictive accuracy and systematic error, with data points clustering closer to the identity line, indicating more robust agreement between predictions and measurements.
Fig. 6.
Scatter plots comparing base model predictions with in-situ soil moisture measurements; (a) SVR; (b) KNN; (c) KR; (d) RF; (e) GB; and (f) XGB.
Among the models, XGBoost, GradientBoosting, and Random Forest achieved the highest R2 values and the lowest ubRMSE and Bias scores. This suggests their superior capability in accurately estimating soil moisture from satellite observations. Conversely, SVR and KernelRidge exhibited the lowest R2 values and the highest ubRMSE and Bias, indicating more significant variability and less consistency in their predictions. These scatter plots offer a valuable visual complement to the numerical results detailed in Table 4, culminating in a comprehensive assessment of each model’s downscaling performance.
Stacking ensemble learning
In the progression of our research, the stacking ensemble learning technique was adopted. The selection of meta-models was informed by the standout performance of the Random Forest, Gradient Boosting, and XGBoost models, as shown in Table 4. These models were chosen due to their superior metrics, indicative of their robust predictive acumen for downscaling satellite-derived soil moisture content. The ensemble framework integrates the predictive outputs of all base models, improving the overall prediction through the synthesis of meta-models. Hyperparameter tuning, a critical step in this process, was carefully conducted to ascertain the optimal model parameters, enhancing the ensemble’s predictive fidelity. The culmination of this hyperparameter tuning is summarized in Table 5, presenting a careful calibration of the meta-models to ensure the ensemble’s operational efficacy. This ensemble approach is expected to significantly advance the precision in hydrological modeling and resource management, setting a new benchmark in satellite soil moisture downscaling.
Table 5.
Optimal hyperparameters and best performance scores for meta models in the stacking ensemble framework.
| Meta model | Best Parameters | Best score |
|---|---|---|
| XGB Meta | colsample_bytree: 0.86, gamma: 0.3, learning_rate: 0.032, max_depth: 7, min_child_weight: 1, n_estimators: 332, subsample: 0.946 | 0.95 |
| GB Meta | learning_rate: 0.042, max_depth: 10, max_features: 0.325, min_samples_leaf: 2, min_samples_split: 2, n_estimators: 199, subsample: 0.842 | 0.95 |
| RF Meta | max_depth: 16, min_samples_leaf: 1, min_samples_split: 2, n_estimators: 100 | 0.95 |
To advance our predictive capabilities, we compared the performances of the meta-models—Random Forest, Gradient Boosting, and XGBoost—with those of the base models—as reported in Table 4—to underscore the improvements conferred by the stacking ensemble technique. The comparative results, shown in Table 6, reveal a notable advantage of the meta models, with all three demonstrating lower RMSE and ubRMSE values alongside higher R2 scores, indicative of a significantly refined prediction accuracy in the validation and test sets. Significantly, the minimal Bias scores approach the ideal of zero, reflecting a diminishment of systematic prediction errors. This quantitative leap in performance validates the ensemble learning approach, where integrating multiple predictive models yields a combined, more precise estimation of soil moisture, thereby improving the efficacy of hydrological models and informing more nuanced water resource management.
Table 6.
Performance evaluation of meta machine learning models against in-situ soil moisture measurements.
| Meta Models | Validation Set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|
| RMSE | ubRMSE | R2 | Bias | RMSE | ubRMSE | R2 | Bias | |
| RandomForest | 1.69 | 1.69 | 0.95 | 0.01 | 1.27 | 1.27 | 0.97 | 0.00 |
| GradientBoosting | 1.65 | 1.64 | 0.95 | 0.00 | 1.22 | 1.22 | 0.97 | 0.02 |
| XGBoost | 1.66 | 1.63 | 0.96 | 0.00 | 1.23 | 1.23 | 0.97 | 0.03 |
Extending the analysis to the ensemble level, Fig. 7 presents violin plots that illustrate the meta-models’ performance metrics—Random Forest, Gradient Boosting, and XGBoost. These plots provide insight into the consistency and reliability, depicting the bias, RMSE, ubRMSE, and R2 scores for the validation set. The violin plots highlight the density and distribution of scores, offering a visual comparison of the meta-models’ precision and tendency towards over- or underestimation (bias). The narrowness of the Random Forest and gradient-boosting plots around the median indicates a strong consensus in predictions. In contrast, the slight spread in the XGBoost plot suggests a marginally broader range of responses. However, all three meta-models demonstrate impressive R2 values, indicative of high explanatory power and a tight alignment with observed data. The RMSE and ubRMSE plots reaffirm the enhanced prediction accuracy achieved through the ensemble approach, with each model showing a concentration of lower error scores compared to the individual base models. Collectively, these violin plots validate the stacking ensemble method’s efficacy, with the meta-models exhibiting a harmonized blend of accuracy and consistency, positioning them as formidable tools for downscaling satellite soil moisture data.
Fig. 7.
Violin plots comparing meta model performance metrics against in-situ soil moisture measurements; (a) R2; (b) ubRMSE; (c) RMSE; and (d) Bias.
Building upon the insights gained from the ensemble learning approach, Fig. 8 presents scatter plots for the three meta-models: Gradient Boosting, Random Forest, and XGBoost. These visualizations illustrate the relationship between predicted soil moisture values and the corresponding in-situ measurements obtained from Time Domain Reflectometry (TDR). The fitted regression lines, depicted in red, highlight the strong predictive capabilities of each model. The clustering of data points around these lines, with minimal scatter, further emphasizes the accuracy of the meta-models. While some minor deviations from the fitted lines exist, they don’t exhibit a systematic pattern of over- or under-estimation for any specific model. This visual analysis strongly supports the robust performance metrics reported in Table 6 and underscores the potential of these meta-models for practical applications in soil moisture prediction.
Fig. 8.

Scatter plots comparing meta model predictions with in-situ soil moisture measurements; (a) RF meta, (b) GB meta, and (c) XGB meta.
High-resolution soil moisture mapping with Meta models
Figure 9 shows how three meta-models—Gradient Boosting, Random Forest, and XGBoost—can create high-resolution soil moisture maps using ensemble learning. These maps use different colors to show the amount of water in the soil, from low to high. The maps reveal the complex patterns of soil moisture that the meta-models have learned from the data. Each model has its way of predicting and displaying soil moisture. This shows how ensemble learning can improve soil moisture mapping and help with water management and hydrology.
Fig. 9.
The 1 km down scale soil moisture by all meta-models on 10 October 2021; (a) RF meta; (b) GB meta; and (c) XGB meta (ArcGIS 10.8.1).
Interpretation of Meta-model predictive influence
The predictive influence exerted by the base models in an ensemble learning structure allows the refinement of the predictive performance of the meta-models. SHAP (SHapley Additive exPlanations) values will be reported as informative measures of the importance and influence of the final predictions of the individual base models for the approach. By quantifying SHAP values, we are able to assess how each base model contributes to the overall performance of the ensemble and, as such, identify key contributors to model accuracy in soil moisture prediction.
Results indicate that KNeighbors and KernelRidge models exert significant influence across the three meta-models: Random Forest, Gradient Boosting, and XGBoost. The pronounced impact of these models suggests that they capture important, nuanced patterns that enhance the accuracy of soil moisture predictions. Specifically, this influence is observed in the GradientBoostingMeta and RandomForestMeta models, where KNeighbors and KernelRidge emerge as the most important base models. In contrast, models such as RandomForest and SVR contribute minimally to the final predicted values, highlighting the varying levels of importance among the base learners within an ensemble. It is this rich interplay of models that enhances the value of ensembles, as they allow for the optimal combination of algorithms’ strengths.
Understanding these relationships in greater detail is critical, as illustrated in Fig. 10, which presents the average SHAP values for the base models, representing their overall contribution to the predictions made by the meta-models. The average SHAP values offer insights into how large, on average, the contributions of each base model are in improving the predictive accuracy of the meta-models. The visualized results, particularly from KNeighbors and KernelRidge, reaffirm that ensemble learning effectively boosts predictive performance through the combination of diverse, strong models.
Fig. 10.

SHAP summary plots displaying the average impact of each base model’s output on the meta-models; (a) RF meta; (b) GB meta; and (c) XGB meta.
A more detailed examination of the individual contributions of the base models is provided by violin plots, as shown in Fig. 11. These plots depict the distribution of SHAP values across the dataset, providing a more nuanced understanding of the variability in the importance of each base model. They illustrate how predictions from various base models contribute to the final output. For example, a larger positive contribution to the final soil moisture estimates is often attributed to higher predictions made by KernelRidge, while RandomForest predictions tend to exhibit smaller and more variable influences.
Fig. 11.

Violin plots showing the distribution of SHAP values for base models across meta-models; (a) RF meta; (b) GB meta; and (c) XGB meta.
The analysis of SHAP values, viewed from both summary and distributional perspectives, reveals the complex dynamics of the ensemble learning process. While certain models, such as KNeighbors and KernelRidge, provide crucial inputs to the meta-models, the distribution of SHAP values indicates that their significance does not extend uniformly across all predictions. The flexibility and power of this ensemble framework stem from its ability to combine different models for more accurate soil moisture predictions across diverse landscapes.
SHAP values further illuminate the key contributors to ensemble performance and elucidate the interactions among base models. These findings are critical for obtaining deeper insights into the model’s internal functionality, which is essential for future optimization and fine-tuning of downscaling techniques in satellite soil moisture prediction.
Analysis of SHAP value variability
The variability of SHAP values across different meta-models is crucial for understanding the consistency of feature importance within the ensemble framework. We calculated the standard deviation of SHAP values for each feature derived from the base models, including Random Forest, Gradient Boosting, XGBoost, KNeighbors, Kernel Ridge, and SVR. The results, illustrated in Fig. 12, highlight that most features exhibit relatively low standard deviations, indicating a strong agreement among the models regarding their importance. However, KNeighbors and KernelRidge demonstrate higher variability, suggesting that these features contribute less reliably to the ensemble’s predictions. Such variability underscores the importance of assessing individual feature contributions, as it informs the robustness of the ensemble model and its predictive accuracy. The analysis indicates that while the majority of features contribute consistently, certain features may exhibit instability, warranting further investigation. This examination enhances our confidence in the model’s interpretability and reaffirms the significant role that identified features play in influencing the overall predictions made by the ensemble.
Fig. 12.
Standard deviations of SHAP values across meta-models for each feature, illustrating the variability of feature importance in the ensemble predictions.
Discussion
Performances of different downscaling approaches
The evaluation of both base and meta machine learning models revealed a significant improvement in predictive accuracy through ensemble learning techniques. Among the base models, Extreme Gradient Boosting (XGBoost) and Gradient Boosting Regression (GB) demonstrated strong performance, achieving the lowest ubRMSE values (1.92 and 2.10 for validation, and 1.48 and 1.70 for test sets, respectively). This strong performance was mirrored in their R-squared values, reaching 0.94 and 0.93 for validation, and 0.96 and 0.95 for test sets, demonstrating their ability to explain a significant proportion of the variance in observed soil moisture. This aligns with studies like Yang et al.27 who found LightGBM, another gradient boosting algorithm, to outperform other machine learning models for soil moisture retrieval.
While Random Forest also performed well (ubRMSE 1.69, R2 0.95 on the validation set), consistent with findings from Abbaszadeh22, the Support Vector Regression (SVR) model yielded less accurate predictions, with higher ubRMSE values (4.87 and 4.60) and lower R-squared scores (0.60 and 0.64) on the validation and test sets, respectively. This highlights the potential advantages of tree-based ensemble methods for soil moisture downscaling, even as base models.
Transitioning to meta-models, facilitated by ensemble learning techniques, led to a notable enhancement in performance across all algorithms. Notably, both XGBoost and GB meta-models achieved remarkably low ubRMSE values (1.23 and 1.22 respectively) and high R-squared values of 0.97 on the test set, indicating a substantial improvement in accuracy compared to their base model counterparts. The Random Forest meta-model also exhibited strong performance, with ubRMSE and R-squared values comparable to those of XGBoost and GB. This finding supports the work of Wang et al43 and Xu et al49 who demonstrated the improved accuracy and stability of stacking ensemble models for soil moisture retrieval.
These findings demonstrate that ensemble learning, through the use of meta-models, significantly improves prediction accuracy as evidenced by the substantial reductions in RMSE and ubRMSE values and the considerable increases in R-squared. This observation supports the effectiveness of this approach in downscaling soil moisture data, aligning with similar conclusions found in the broader context of ensemble learning applications51. The superior performance of ensemble techniques can be attributed to their ability to combine the strengths of multiple models, mitigating the limitations of individual algorithms and ultimately enhancing prediction accuracy, as highlighted in studies like Zhang et al44 and Han et al.78.
Bias and consistency of SMAP L4 and AMSR2 datasets compared to in-situ observations
Figure 13 displays bias and ubRMSE density plots, which depict the differences in soil moisture forecasts between SMAP L4 and AMSR2 for various land use groups. The SMAP L4 data consistently underestimates soil moisture levels, especially in agricultural areas, as depicted in Fig. 13a. The underestimation mentioned above aligns with the results of a study by Fan et al80. They found that SMAP L4, a soil moisture product, consistently underestimates soil moisture in areas with vegetation disturbance. This underestimation is attributed to mistakes in temperature data from GMAO. In contrast, AMSR2 has a proclivity to overstate soil moisture levels, especially in semi-arid and arid areas81. Figure 13b emphasizes the decreased variability of SMAP L4, as mistakes are primarily clustered around 4, indicating more consistent predictions than AMSR2, which exhibits a broader range of errors. This is consistent with the findings of Brust et al82, who discovered that SMAP L4 effectively decreased the root mean square error (RMSE) in evapotranspiration estimates, particularly in places with restricted water availability.
Fig. 13.
Density plots for bias (a) and unbiased root mean square error (ubRMSE) (b) comparing SMAP L4 and AMSR2 datasets across agricultural, semi-arid, and arid land use classes.
Figure 14 displays radar plots that depict the bias and ubRMSE values for different land use classifications. The AMSR2 satellite routinely provides soil moisture measurements higher than the actual values, especially in agricultural districts. The highest difference between the satellite measurements and the actual values is found in these areas, with a bias of 23.80. SMAP L4 consistently underestimates soil moisture in all land use types, particularly in agricultural regions (Bias = -19.31). This supports the findings of Tavakol et al83, which indicate that SMAP L4 performs more accurately in regions with lower vegetation. Figure 14b provides additional evidence of SMAP L4’s exceptional performance in terms of error, as it exhibits the lowest ubRMSE (2.19) under semi-arid circumstances. This observation is consistent with the findings of Cai et al.84 , who emphasized the efficacy of SMAP L4 for regional hydrological modeling. The results highlight the consistent performance of SMAP L4 in different environmental situations, establishing it as the preferable option for monitoring soil moisture in various land use categories85.
Fig. 14.
Radar plots comparing (a) bias and (b) ubRMSE of SMAP L4 and AMSR2 datasets across agricultural, semi-arid, and arid land use classes.
Limitations and future research
While this study demonstrates the effectiveness of stacking ensemble learning for downscaling satellite soil moisture data, some limitations warrant further consideration.
Data availability and quality:
The accuracy of the downscaled soil moisture estimates depends on the quality and spatial coverage of the input data. In regions with limited ground-based measurements or uncertainties in satellite products, the model’s performance may be affected.
Model Generalizability: The models developed in this study were specifically trained and tested for the Urmia sub-basin. Applying these models to other regions with different climatic and topographical characteristics may require additional data or model adjustments to ensure reliable performance.
Computational Cost: Training and running ensemble models, especially with large datasets, can require significant computational resources. This could be a constraint for real-time applications or in situations with limited computing power.
Uncertainty Quantification: Further research is needed to quantify the uncertainties associated with the downscaled soil moisture estimates. This is crucial for understanding the reliability of the predictions and their potential impact on decision-making processes.
Several avenues for future research can further enhance the capabilities of soil moisture downscaling using ensemble learning:
Exploring other ensemble techniques: Investigating alternative ensemble methods, such as bagging or boosting, could provide valuable comparisons and identify the most suitable approach for different scenarios and data characteristics.
Incorporating additional data sources: Including additional data sources, such as radar observations, land cover maps, or detailed soil property data, could potentially improve the prediction accuracy and capture finer-scale variations in soil moisture.
Developing operational downscaling systems: Further development is needed to create operational downscaling systems that can provide real-time or near-real-time high-resolution soil moisture data for practical applications in water resource management and agricultural planning.
Conclusions
This study underscores a pivotal advancement in soil moisture monitoring by integrating ensemble learning methods, particularly the stacking ensemble approach, with cutting-edge remote sensing and machine learning (ML) techniques. Employing ensemble learning and incorporating models such as random forest, gradient boosting, and XGBoost as base learners significantly enhanced the precision of SM downscaling. These models were meticulously selected based on their superior performance and were instrumental in improving predictive accuracy by capturing the complex dynamics of soil moisture variability.
Incorporating SHapley Additive exPlanations (SHAP) values for model interpretability provided more profound insights into the influence of various factors on SM predictions, highlighting the nuanced contribution of each base model within the ensemble framework. This approach enriched our understanding of soil moisture dynamics, offering valuable perspectives for future research, water resource management, and agricultural planning.
Focusing on the Urmia sub-basin, the study demonstrated the adaptability and effectiveness of these methodologies across diverse climatic and topographical settings, setting a new benchmark in the precision and reliability of SM monitoring. By leveraging the strengths of Random Forest, Gradient Boosting, and XGBoost within a stacked ensemble framework, this research contributes a robust tool for advancing water resource management practices and enhancing ecosystem and societal resilience, especially in basins such as Lake Urmia, where the drying up of the lake has severe consequences in various environmental, economic, and social dimensions. This tool can help make more accurate macro decisions in this basin.
Author contributions
M.S.T.: Conceptualization, Software, Validation, Investigation, Resources, Data Curation, Writing—Original Draft, Visualization; M.H.N.: Supervision, Methodology, Methodology, Investigation, Writing—Review & Editing; A.H.G.: Supervision, Methodology, Conceptualization, Writing—Review & Editing.
Data availability
The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Dorigo, W. et al. ESA CCI soil moisture for improved earth system understanding: State-of-the art and future directions. Remote Sens. Environ.203, 185–215 (2017). [Google Scholar]
- 2.Peng, J., Loew, A., Zhang, S., Wang, J. & Niesel, J. Spatial downscaling of satellite soil moisture data using a vegetation temperature condition index. IEEE Trans. Geosci. Remote Sens.54, 558–566 (2016). [Google Scholar]
- 3.Seneviratne, S. I. et al. Investigating soil moisture–climate interactions in a changing climate: A review. Earth-Sci. Rev.99, 125–161 (2010). [Google Scholar]
- 4.McColl, K. A. et al. The global distribution and dynamics of surface soil moisture. Nat. Geosci.10, 100–104 (2017). [Google Scholar]
- 5.Dobriyal, P., Qureshi, A., Badola, R. & Hussain, S. A. A review of the methods available for estimating soil moisture and its implications for water resource management. J. Hydrol.458–459, 110–117 (2012). [Google Scholar]
- 6.Peng, J., Loew, A., Merlin, O. & Verhoest, N. E. C. A review of spatial downscaling of satellite remotely sensed soil moisture. Rev. Geophys.55, 341–366 (2017). [Google Scholar]
- 7.Scipal, K., Holmes, T., de Jeu, R., Naeimi, V. & Wagner, W. A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett.35 (2008).
- 8.Hirschi, M. et al. Observational evidence for soil-moisture impact on hot extremes in southeastern Europe. Nat. Geosci.4, 17–21 (2010). [Google Scholar]
- 9.Huntington, T. G. Evidence for intensification of the global water cycle: Review and synthesis. J. Hydrol.319, 83–95 (2006). [Google Scholar]
- 10.Wang, A., Lettenmaier, D. P. & Sheffield, J. Soil moisture drought in China, 1950–2006. J. Clim.24, 3257–3271 (2011). [Google Scholar]
- 11.Zhang, Y. et al. Multi-decadal trends in global terrestrial evapotranspiration and its components. Sci. Rep.6 (2016). [DOI] [PMC free article] [PubMed]
- 12.Sabaghy, S. et al. Comprehensive analysis of alternative downscaled soil moisture products. Remote Sens. Environ.239, 111586 (2020). [Google Scholar]
- 13.Yao, P. et al. A global daily soil moisture dataset derived from Chinese FengYun Microwave Radiation Imager (MWRI)(2010–2019). Sci. Data10 (2023). [DOI] [PMC free article] [PubMed]
- 14.Robock, A. et al. The global soil moisture data bank. Bull. Am. Meteorol. Soc.81, 1281–1299 (2000). [Google Scholar]
- 15.Topp, G. C., Davis, J. L. & Annan, A. P. Electromagnetic determination of soil water content: Measurements in coaxial transmission lines. Water Resour. Res.16, 574–582 (1980). [Google Scholar]
- 16.Ebrahimi-Khusfi, M. et al. Comparison of soil moisture retrieval algorithms based on the synergy between SMAP and SMOS-IC. Int. J. Appl. Earth Obs. Geoinformation67, 148–160 (2018). [Google Scholar]
- 17.Ma, H. et al. Satellite surface soil moisture from SMAP, SMOS, AMSR2 and ESA CCI: A comprehensive assessment using global ground-based observations. Remote Sens. Environ.231, 111215 (2019). [Google Scholar]
- 18.Entekhabi, D. et al. The soil moisture active passive (SMAP) mission. Proc. IEEE98, 704–716 (2010). [Google Scholar]
- 19.Kerr, Y. H. et al. Soil moisture retrieval from space: the Soil Moisture and Ocean Salinity (SMOS) mission. IEEE Trans. Geosci. Remote Sens.39, 1729–1735 (2001). [Google Scholar]
- 20.Jackson, T. J. et al. Validation of advanced microwave scanning radiometer soil moisture products. IEEE Trans. Geosci. Remote Sens.48, 4256–4272 (2010). [Google Scholar]
- 21.Zhao, W., Sánchez, N., Lu, H. & Li, A. A spatial downscaling approach for the SMAP passive surface soil moisture product using random forest regression. J. Hydrol.563, 1009–1024 (2018). [Google Scholar]
- 22.Abbaszadeh, P., Moradkhani, H. & Zhan, X. Downscaling SMAP radiometer soil moisture over the CONUS using an ensemble learning method. Water Resour. Res.55, 324–344 (2019). [Google Scholar]
- 23.Liu, J., Rahmani, F., Lawson, K. & Shen, C. A multiscale deep learning model for soil moisture integrating satellite and in situ data. Geophys. Res. Lett.49, e2021GL096847 (2022).
- 24.Guevara, M. & Vargas, R. Downscaling satellite soil moisture using geomorphometry and machine learning. PLOS ONE14, e0219639 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vergopolan, N. et al. Field-scale soil moisture bridges the spatial-scale gap between drought monitoring and agricultural yields. Hydrol. Earth Syst. Sci.25, 1827–1847 (2021). [Google Scholar]
- 26.Das, B. et al. Comparison of bagging, boosting and stacking algorithms for surface soil moisture mapping using optical-thermal-microwave remote sensing synergies. CATENA217, 106485 (2022). [Google Scholar]
- 27.Yang, H., Wang, Q., Zhao, W., Tong, X. & Atkinson, P. M. Reconstruction of a global 9 km, 8-day SMAP surface soil moisture dataset during 2015–2020 by spatiotemporal fusion. J. Remote Sens.2022, (2022).
- 28.Srivastava, A., Sahoo, B., Narendra Singh Raghuwanshi & Singh, R. Evaluation of variable-infiltration capacity model and MODIS-terra satellite-derived grid-scale evapotranspiration estimates in a river basin with tropical monsoon-type climatology. J. Irrig. Drain. Eng.143 (2017).
- 29.Ahmed Samir Abowarda et al. Generating surface soil moisture at 30 m spatial resolution using both data fusion and machine learning toward better water resources management at the field scale. Remote Sens. Environ.255, 112301–112301 (2021).
- 30.Hutengs, C. & Vohland, M. Downscaling land surface temperatures at regional scales with random forest regression. Remote Sens. Environ.178, 127–141 (2016). [Google Scholar]
- 31.Liu, Y., Jing, W., Wang, Q. & Xia, X. Generating high-resolution daily soil moisture by using spatial downscaling techniques: A comparison of six machine learning algorithms. Adv. Water Resour.141, 103601 (2020). [Google Scholar]
- 32.Estimation of Surface Soil Moisture With Downscaled Land Surface Temperatures Using a Data Fusion Approach for Heterogeneous Agricultural Land - Bai - 2019 - Water Resources Research - Wiley Online Library. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018WR024162.
- 33.Ghafari, E., Walker, J. P., Zhu, L., Colliander, A. & Faridhosseini, A. Spatial downscaling of SMAP radiometer soil moisture using radar data: Application of machine learning to the SMAPEx and SMAPVEX campaigns. Sci. Remote Sens.9, 100122 (2024). [Google Scholar]
- 34.Liu, Y., Yang, Y., Jing, W. & Yue, X. Comparison of different machine learning approaches for monthly satellite-based soil moisture downscaling over Northeast China. Remote Sens.10, 31 (2017). [Google Scholar]
- 35.Fang, K. & Shen, C. Near-real-time forecast of satellite-based soil moisture using long short-term memory with an adaptive data integration kernel. J. Hydrometeorol.21, 399–413 (2020). [Google Scholar]
- 36.Karthikeyan, L. & Mishra, A. K. Multi-layer high-resolution soil moisture estimation using machine learning over the United States. Remote Sens. Environ.266, 112706 (2021). [Google Scholar]
- 37.Wu, T., Zhang, W., Jiao, X., Guo, W., & Yousef Alhaj Hamoud. Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration. Comput. Electron. Agric.184, 106039–106039 (2021).
- 38.Senanayake, I. P. et al. Spatial downscaling of satellite-based soil moisture products using machine learning techniques: A review. Remote Sens.16, 2067 (2024). [Google Scholar]
- 39.Mao, Y., Crow, W. T. & Nijssen, B. Dual state/rainfall correction via soil moisture assimilation for improved streamflow simulation: evaluation of a large-scale implementation with Soil Moisture Active Passive (SMAP) satellite data. Hydrol. Earth Syst. Sci.24, 615–631 (2020). [Google Scholar]
- 40.Zhong, Y. et al. Downscaling passive microwave soil moisture estimates using stand-alone optical remote sensing data. IEEE Trans. Geosci. Remote Sens.62, 1–19 (2024). [Google Scholar]
- 41.Zhu, Z., Bo, Y. & Sun, T. Spatial downscaling of satellite soil moisture products based on apparent thermal inertia: Considering the effect of vegetation condition. J. Hydrol.616, 128824 (2023). [Google Scholar]
- 42.Lu, M. et al. A Stacking Ensemble Model of Various Machine Learning Models for Daily Runoff Forecasting. Water15, 1265 (2023). [Google Scholar]
- 43.Wang, S., Wu, Y., Li, R. & Wang, X. Remote sensing-based retrieval of soil moisture content using stacking ensemble learning models. Land Degrad. Dev.34, 911–925 (2022). [Google Scholar]
- 44.Zhang, Y. et al. Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning. Earth Syst. Sci. Data15, 2055–2079 (2023). [Google Scholar]
- 45.Cui, S., Yin, Y., Wang, D., Li, Z. & Wang, Y. A stacking-based ensemble learning method for earthquake casualty prediction. Appl. Soft Comput.101, 107038 (2021). [Google Scholar]
- 46.Ribeiro, M. H. D. M., da Silva, R. G., Moreno, S. R., Mariani, V. C. & Coelho, L. dos S. Efficient bootstrap stacking ensemble learning model applied to wind power generation forecasting. Int. J. Electr. Power Energy Syst.136, 107712 (2022).
- 47.Abu, et al. Improvement of flood susceptibility mapping by introducing hybrid ensemble learning algorithms and high-resolution satellite imageries. Nat. Hazards119, 1–37 (2023). [Google Scholar]
- 48.Yao, J., Zhang, X., Luo, W., Liu, C. & Ren, L. Applications of Stacking/Blending ensemble learning approaches for evaluating flash flood susceptibility. Int. J. Appl. Earth Obs. Geoinformation112, 102932 (2022). [Google Scholar]
- 49.A Spatial Downscaling Framework for SMAP Soil Moisture Based on Stacking Strategy. https://www.mdpi.com/2072-4292/16/1/200.
- 50.GMD - Ensemble of optimised machine learning algorithms for predicting surface soil moisture content at a global scale.
- 51.Tao, S. et al. Retrieving soil moisture from grape growing areas using multi-feature and stacking-based ensemble learning modeling. Comput. Electron. Agric.204, 107537 (2023). [Google Scholar]
- 52.Ghajarnia, N., Liaghat, A. & Daneshkar Arasteh, P. Comparison and evaluation of high resolution precipitation estimation products in Urmia Basin-Iran. Atmospheric Res.158–159, 50–65 (2015).
- 53.Identification of trends in hydrological and climatic variables in Urmia Lake basin, Iran | Theoretical and Applied Climatology. https://link.springer.com/article/10.1007/s00704-014-1120-4.
- 54.Jalilvand, E., Tajrishy, M., Ghazi Zadeh Hashemi, S. A. & Brocca, L. Quantification of irrigation water using remote sensing of soil moisture in a semi-arid region. Remote Sens. Environ.231, 111226 (2019).
- 55.Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran) | Arabian Journal of Geosciences. https://link.springer.com/article/10.1007/s12517-020-05904-8.
- 56.Taheri, M. et al. Investigating the temporal and spatial variations of water consumption in Urmia Lake River Basin considering the climate and anthropogenic effects on the agriculture in the basin. Agric. Water Manag.213, 782–791 (2019). [Google Scholar]
- 57.Application of the TDR Soil Moisture Sensor for Terramechanical Research. https://www.mdpi.com/1424-8220/19/9/2116. [DOI] [PMC free article] [PubMed]
- 58.Walker, J. P., Willgoose, G. R. & Kalma, J. D. In situ measurement of soil moisture: A comparison of techniques. J. Hydrol.293, 85–99 (2004). [Google Scholar]
- 59.Calamita, G. et al. Electrical resistivity and TDR methods for soil moisture estimation in central Italy test-sites. J. Hydrol.454–455, 101–112 (2012). [Google Scholar]
- 60.O’Neill, P. et al. Soil Moisture Active Passive (SMAP) Project: Calibration and Validation for the L2/3_SM_P Version 7 and L2/3_SM_P_E Version 4 Data Products. https://m.88jbb188.net/sites/default/files/l2_sm_p_ar_r18_final_oct2021.pdf (2020).
- 61.Colliander, A. et al. Validation of SMAP surface soil moisture products with core validation sites. Remote Sens. Environ.191, 215–231 (2017). [Google Scholar]
- 62.Validation of Soil Moisture Data Products From the NASA SMAP Mission | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/9599364.
- 63.Assessment of the SMAP Level-4 Surface and Root-Zone Soil Moisture Product Using In Situ Measurements in: Journal of Hydrometeorology Volume 18 Issue 10 (2017). https://journals.ametsoc.org/view/journals/hydr/18/10/jhm-d-17-0063_1.xml?tab_body=abstract-display. [DOI] [PMC free article] [PubMed]
- 64.Jones, L. A. et al. The SMAP Level 4 Carbon Product for Monitoring Ecosystem Land-Atmosphere CO2 Exchange. IEEE Trans. Geosci. Remote Sens.55, 6517–6532 (2017). [Google Scholar]
- 65.Yao, P. et al. A long term global daily soil moisture dataset derived from AMSR-E and AMSR2 (2002–2019) | Scientific Data. Sci. Data8, 143 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Draper, C. S., Walker, J. P., Steinle, P. J., de Jeu, R. A. M. & Holmes, T. R. H. An evaluation of AMSR–E derived soil moisture over Australia. Remote Sens. Environ.113, 703–710 (2009). [Google Scholar]
- 67.Imaoka, K. et al. Status of AMSR2 instrument on GCOM-W1. in Earth observing missions and sensors: Development, implementation, and characterization II vol. 8528 201–206 (SPIE, 2012).
- 68.Chan, S. K. et al. Development and assessment of the SMAP enhanced passive soil moisture product. Remote Sens. Environ.204, 931–941 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Crow, W. T. et al. Upscaling sparse ground-based soil moisture observations for the validation of coarse-resolution satellite soil moisture products. Rev. Geophys.50, (2012).
- 70.A method to downscale soil moisture to fine resolutions using topographic, vegetation, and soil data - ScienceDirect. https://www.sciencedirect.com/science/article/abs/pii/S030917081400236X.
- 71.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
- 72.Niazkar, M. et al. Applications of XGBoost in water resources engineering: A systematic literature review (Dec 2018–May 2023). Environ. Model. Softw.174, 105971 (2024). [Google Scholar]
- 73.The new method of multicriteria complex proportional assessment of projects. https://etalpykla.vilniustech.lt/handle/123456789/111916.
- 74.Roozbahani, A., Ghased, H. & Hashemy Shahedany, M. Inter-basin water transfer planning with grey COPRAS and fuzzy COPRAS techniques: A case study in Iranian Central Plateau. Sci. Total Environ.726, 138499 (2020). [DOI] [PubMed] [Google Scholar]
- 75.Victoria, A. H. & Maragatham, G. Automatic tuning of hyperparameters using Bayesian optimization. Evol. Syst.12, 217–223 (2021). [Google Scholar]
- 76.Wolpert, D. H. Stacked generalization. Neural Netw.5, 241–259 (1992). [Google Scholar]
- 77.Breiman, L. Bagging predictors. Mach. Learn.24, 123–140 (1996). [Google Scholar]
- 78.Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst.30, (2017).
- 79.Aldrees, A., Khan, M., Taha, A. T. B. & Ali, M. Evaluation of water quality indexes with novel machine learning and SHapley Additive ExPlanation (SHAP) approaches. J. Water Process Eng.58, 104789 (2024). [Google Scholar]
- 80.SMAP underestimates soil moisture in vegetation-disturbed areas primarily as a result of biased surface temperature data - ScienceDirect. https://www.sciencedirect.com/science/article/abs/pii/S0034425720302844?via%3Dihub.
- 81.Remote Sensing | Free Full-Text | Validation Analysis of SMAP and AMSR2 Soil Moisture Products over the United States Using Ground-Based Measurements. https://www.mdpi.com/2072-4292/9/2/104.
- 82.Using SMAP Level-4 soil moisture to constrain MOD16 evapotranspiration over the contiguous USA - ScienceDirect. https://www.sciencedirect.com/science/article/abs/pii/S0034425720306507.
- 83.Evaluation analysis of NASA SMAP L3 and L4 and SPoRT-LIS soil moisture data in the United States - ScienceDirect. https://www.sciencedirect.com/science/article/abs/pii/S0034425719301919.
- 84.Remote Sensing | Free Full-Text | Downscaling of SMAP Soil Moisture Data by Using a Deep Belief Network. https://www.mdpi.com/2072-4292/14/22/5681.
- 85.Remote Sensing | Free Full-Text | Multi-Scale Assessment of SMAP Level 3 and Level 4 Soil Moisture Products over the Soil Moisture Network within the ShanDian River (SMN-SDR) Basin, China. https://www.mdpi.com/2072-4292/14/4/982.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.


















