Abstract
Accurate groundwater level prediction is essential for sustainable water management in arid and semi-arid regions. This study evaluated three machine learning models—Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM)—to forecast groundwater levels across five hydrogeological zones of the Najafabad Plain, Iran. Input variables included climatic (precipitation, temperature), hydrological (previous groundwater level), and anthropogenic (irrigation and groundwater abstraction) factors. Model performance was assessed using the coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), Willmott’s index (WI), and percent bias (PBIAS). Among the algorithms, XGBoost showed the best predictive skill, with mean testing results of R² = 0.8480, RMSE = 1.5540 m, MAE = 0.8800 m, WI = 0.9660, and PBIAS = + 0.0400%. The near-zero PBIAS, ranging from − 1.7000% to + 2.4000% across zones, indicates minimal bias and high robustness under heterogeneous hydrogeological conditions. In comparison, RF achieved moderate accuracy (R² = 0.7480), while SVM attained strong training performance (R² = 0.9180) but weaker generalization in testing (R² = 0.8220), reflecting overfitting. Overall, the results confirm the effectiveness of ensemble methods—particularly XGBoost—in groundwater prediction and highlight the importance of integrating climatic and anthropogenic drivers for sustainable aquifer management.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-32376-1.
Keywords: Groundwater level prediction, Machine learning, Najafabad plain, Random forest, Extreme gradient boosting, Support vector machine
Subject terms: Environmental sciences, Hydrology
Introduction
Groundwater plays a vital role in sustaining agricultural productivity, urban development, and ecosystem services, particularly in arid and semi-arid regions such as central Iran1. In these environments, where surface water resources are limited or highly variable, aquifers serve as the primary source of freshwater for both irrigation and domestic use. However, overexploitation of groundwater due to increasing population, agricultural expansion, and declining precipitation under climate change has led to alarming consequences, including groundwater depletion, land subsidence, reduced water quality, and diminished long-term water security2–5. This escalating stress highlights the urgent need for robust groundwater monitoring and predictive tools to support sustainable resource management.
Traditional physically-based models such as MODFLOW have long been used for simulating groundwater flow and levels6. While these models offer valuable process-based insights, their application is often constrained by the requirement for extensive input data (e.g., aquifer geometry, hydraulic conductivity, recharge rates), which are frequently sparse or uncertain—particularly in data-scarce regions like Iran7. Additionally, these models can be computationally intensive and difficult to calibrate under dynamically changing human-water interactions.
In response to these challenges, data-driven models—especially machine learning (ML) algorithms—have emerged as powerful alternatives for groundwater prediction. Algorithms such as Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) have shown promising results in capturing nonlinear relationships between groundwater levels and influencing factors such as precipitation, temperature, and human activities8–10. These models offer greater flexibility, faster computation, and reduced dependency on physical assumptions, making them suitable for complex or data-limited environments11–13.
Several studies have successfully applied ML methods to groundwater level forecasting in various regions of Iran and beyond14–18. For example, Vafadar, et al19. applied RF, SVM, and XGBoost for groundwater potential mapping in the Tehran and Karaj plains and found XGBoost to outperform the other models. Similarly, Zarafshan, et al20. evaluated various ML models for groundwater prediction in the Najafabad region and confirmed the potential of support vector-based approaches. Ibrahem Ahmed Osman, et al21., who applied XGBoost to predict groundwater depth in Selangor, Malaysia, and reported high predictive accuracy. However, many of these studies focused on either a single model or an aggregated spatial scale, often overlooking the spatial heterogeneity of hydrogeological conditions and the importance of local calibration—a gap this study aims to address.
This study addresses this gap by conducting a comprehensive comparative analysis of RF, SVM, and XGBoost for predicting monthly groundwater levels in the Najafabad plain, Isfahan province. A novel contribution of this work is the spatial disaggregation of the study area into five distinct hydrogeological zones, enabling more detailed insight into the model performance under varying aquifer conditions. This zonal framework enhances the interpretability and applicability of the models for groundwater management in heterogeneous settings.
The primary objective of this study is to develop and evaluate robust machine learning models for accurate groundwater level prediction across five hydrogeological zones of the Najafabad Plain, Iran, using multi-source climatic, hydrological, and anthropogenic data. Specifically, the study aims to:
Integrate multi-domain datasets—including meteorological variables (precipitation, temperature), surface water indicators (river discharge), and human-induced factors (irrigation volume, groundwater abstraction)—for zone-specific groundwater forecasting.
Compare the performance of three machine learning algorithms—Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM)—in both training and testing phases across heterogeneous aquifer conditions.
Identify the most suitable algorithm for spatially diverse groundwater systems under limited-resolution anthropogenic data conditions.
Novelty of the study lies in:
A zone-specific modeling framework that accounts for hydrogeological heterogeneity, rather than treating the plain as a homogeneous system.
Incorporation of multi-source, mixed-frequency data (climatic, hydrological, anthropogenic) in an arid/semi-arid context with limited data availability.
A comparative evaluation of ensemble and kernel-based methods in such heterogeneous settings, highlighting the superior scalability and generalization of XGBoost in groundwater level prediction.
The outcomes of this research are expected to support targeted groundwater management strategies, inform adaptive policy-making, and contribute to the growing body of literature on machine learning applications in groundwater science. However, the study is not without limitations. The models rely solely on historical time series data, which may not fully capture future changes driven by unexpected anthropogenic or climatic shifts. Additionally, the spatial resolution of input data and the availability of high-quality monitoring records may influence the reliability of predictions, particularly in areas with sparse observational coverage. Finally, while machine learning models are powerful in capturing correlations, they do not inherently provide mechanistic insights into subsurface hydrological processes, and thus should be complemented by physical understanding for integrated groundwater management.
Methods
Study area
The Najafabad Plain, located in Isfahan Province in central Iran, is a key agricultural region within the Zayandeh Rud Basin. It spans approximately 1,712 km², comprising a central alluvial aquifer of about 940.9 km², with the remaining 772.5 km² consisting of surrounding highlands. Geographically, the plain lies between longitudes 50°57′ and 51°44′26″ E and latitudes 32°20′13″ to 32°49′21″ N (Fig. 1). The aquifer is hydrologically isolated from adjacent groundwater systems and functions as the principal reservoir for subsurface water in the area.
Fig. 1.
The study area, including the location of Isfahan Province in Iran, the location of the Najafabad Plain within Isfahan Province, the zoning of the Najafabad Plain, the locations of the utilized stations, and the positions of existing wells. The map was generated using ArcMap 10.8 (ESRI, Redlands, CA, USA; https://www.esri.com).
The elevation of the plain ranges from roughly 2,906 m above sea level in the northern highlands to approximately 1,538 m in the southern lowlands, near the Zayandeh Rud River, which provides intermittent surface flow. The region experiences a semi-arid climate, with mean annual precipitation ranging from 158 to 172 mm, while the potential evapotranspiration exceeds 1,500 mm. The average annual temperature is approximately 15 °C. Water usage in the region is dominated by agriculture, accounting for about 93.3% of the total water consumption. Domestic and industrial uses represent around 4.5% and 2.2%, respectively. Groundwater supplies nearly 72% of the total water demand, primarily through wells. In total, 15,933 groundwater extraction points have been identified, including 15,753 wells, 151 qanats, and 29 springs. Among them, 15,673 wells are located within the alluvial aquifer, collectively extracting approximately 881 million cubic meters of water annually22.
In recent years, declining precipitation and increasing water demand have exerted considerable pressure on groundwater resources. Although the development of new wells is legally restricted, unauthorized wells—frequently reported by the Isfahan Regional Water Authority—pose a significant challenge to sustainable groundwater management in the plain.
Data collection and preprocessing
Meteorological, surface Water, and groundwater data
To investigate groundwater level fluctuations in the Najafabad Plain, comprehensive hydrological and meteorological datasets were collected and preprocessed. Meteorological variables, including precipitation and temperature, were obtained from the Najafabad synoptic station, which served as the primary source for representing regional climatic conditions23. Surface water data were acquired from three active hydrometric stations—Lenj, Mousian, and Zefreh—located along the Zayandeh Rud River, the principal surface water source in the region24. These stations provide valuable insights into river flow dynamics and their interactions with the underlying groundwater system. The total annual surface inflow into the plain, including contributions from the Lenjanat, Mahyar-e-Shomali, and Karon sub-basins, was estimated at approximately 557.4 million cubic meters.
These representative stations (see Table 1; Fig. 1) were selected based on their strategic locations and the availability of long-term, reliable data, ensuring adequate spatial and temporal coverage of the key hydrological and climatic variables influencing groundwater levels across the study area. During the study period (2011–2021), the average annual precipitation was approximately 195 mm in the upland regions and 153.7 mm in the plain. March was identified as the wettest month, with average rainfall values of 31.6 mm in the uplands and 29.3 mm in the plain. Mean annual temperatures were estimated at around 13.8 °C in the uplands and 15.4 °C in the plain.
Table 1.
Selected stations for the Research.
| Station Name | Station Type | Longitude (°E) | Latitude (°N) | Elevation (m) | Start Year | End Year |
|---|---|---|---|---|---|---|
| Najafabad | Synoptic | 51.3891 | 32.6042 | 1634 | 1970 | 2022 |
| Zefreh | Hydrometric | 51.4986 | 32.5019 | 1623 | 1965 | 2022 |
| Mousian | Hydrometric | 51.5261 | 32.5772 | 1600 | 1995 | 2022 |
| Lenej | Hydrometric | 51.5575 | 39.3236 | 1646 | 1980 | 2022 |
Groundwater level data were collected from 53 observation wells distributed across the Najafabad Plain, corresponding to a monitoring density of approximately 1.4 wells per 25 km² (see Fig. 1)24. These wells provided continuous records of groundwater level fluctuations throughout the aquifer system, which extends beneath approximately 87% of the study area and constitutes the primary source of groundwater extraction.
For spatial analysis, the study area was divided into two major zones: the uplands and the plain. The uplands, primarily located in the northern and western parts of the region, cover an area of about 679.2 km² and are predominantly composed of Jurassic and Cretaceous limestone formations. In contrast, the plain—spanning approximately 932.2 km²—is characterized by Quaternary alluvial deposits with variable thicknesses and hydraulic conductivities. This zoning approach enabled differentiation of hydroclimatic and geological conditions across the region, thereby enhancing the accuracy of groundwater level modeling in response to meteorological and surface water drivers.
Outlier detection and treatment
To enhance the quality and reliability of the dataset, outlier detection was performed using the boxplot method—a widely adopted graphical technique in hydrological studies for identifying anomalous values. This approach effectively visualizes the distribution and dispersion of data, highlighting values that deviate significantly from the typical range. Given that the primary objective of this study is not the analysis of extreme events, the identified outliers—presumed to be due to measurement or recording errors—were excluded from further analysis. Subsequently, the resulting gaps in the time series were filled using interpolation techniques, ensuring data continuity and temporal consistency across all variables.
Regional and aquifer zoning
The Najafabad aquifer, like many complex groundwater systems, exhibits significant spatial heterogeneity due to the combined influence of natural and anthropogenic factors such as groundwater abstraction, precipitation, riverbed infiltration, and temperature variability. Owing to the physiographic diversity of the plain, these factors impact different regions of the aquifer to varying degrees. For example, in the western parts of the aquifer—located at a considerable distance from the Zayandeh-Rud River—the influence of river discharge on groundwater levels is minimal to negligible. To improve the spatial resolution and accuracy of groundwater level modeling, the aquifer was subdivided into five distinct zones. This zoning was based on a conceptual understanding of the system, as well as the physical and geographical characteristics of the area, particularly in relation to the presence of the river and irrigation canals. The spatial layout and boundaries of these zones are depicted in Fig. 1, and their key characteristics are summarized in Table 2.
Table 2.
Description of groundwater zones based on dominant influencing factors in the Najafabad aquifer.
| Zone | Name | Main Influential Factor | Description |
|---|---|---|---|
| Z1 | River Corridor | River discharge | Influenced by proximity to river |
| Z2 | Right Side of Nekouabad Canal | Irrigation canal (Nekouabad) | Affected by controlled irrigation system |
| Z3 | Left Side of Nekouabad canal | Irrigation canal (Nekouabad) | Similar to Z2 but spatially distinct |
| Z4 | Khamiran Irrigation Area | Groundwater abstraction | Affected by Khamiran irrigation scheme |
| Z5 | Western Highlands | Precipitation and natural recharge | Mostly natural influence, limited access |
Zone 1 – River Corridor Zone: This zone includes areas adjacent to the Zayandeh-Rud River, where groundwater levels are directly influenced by river discharge. Piezometric data from this region reflect the dynamic interaction between surface water and groundwater.
Zones 2 and 3 – Nekouabad Irrigation Zones: These zones encompass agricultural areas affected by the Nekouabad irrigation network. Based on their geographical orientation, they are divided into: Zone 2: Right side of the Nekouabad irrigation system, and Zone 3: Left side of the Nekouabad irrigation system.
Zone 4 – Khamiran Irrigation Zone: Located in the western part of the aquifer, this zone is mainly influenced by groundwater abstraction for agriculture under the Khamiran irrigation scheme.
Zone 5 – Western Highland Zone: This zone includes piezometers situated in the elevated western highlands. Here, groundwater dynamics are primarily governed by natural recharge from precipitation, with limited anthropogenic impact.
Descriptive statistics of input variables
To provide an overview of the characteristics of the modeling inputs, descriptive statistics were calculated for all variables in each hydrogeological zone during the training and testing phases. The analyzed variables included precipitation (mm/month), temperature (°C), river discharge (m³/s), groundwater abstraction (m), irrigation volume (MCM/month), and groundwater level (m). For each variable, the minimum (Min), maximum (Max), mean (Mean), kurtosis, skewness, and standard deviation (SD) were computed separately for the training, testing, and total datasets.
These statistics help to identify the distribution patterns, variability, and potential anomalies in the input data, which may influence model performance. For instance, high kurtosis and skewness values in precipitation indicate occasional extreme rainfall events, while the relatively low SD in groundwater levels reflects gradual changes over time. The detailed descriptive statistics for each zone are presented in Table 3.
Table 3.
Descriptive statistics of input variables for training, testing, and total datasets in the five hydrogeological zones.
| Zone Z1 | Zone Z2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Variable Name | Statistical parameters | Training set (n = 91) | Testing set (n = 23) | Total data set (n = 114) | Variable Name | Statistical parameters | Training set (n=…) | Testing set (n=…) | Total data set (n=…) |
| Precipitation (mm/month) | Min | 0.00 | 0.00 | 0.00 | Precipitation (mm/month) | Min | 0.00 | 0.00 | 0.00 |
| Max | 72.75 | 47.75 | 72.75 | Max | 71.50 | 61.00 | 71.50 | ||
| Mean | 10.41 | 7.16 | 9.76 | Mean | 12.59 | 10.20 | 12.11 | ||
| Kurtosis | 4.72 | 4.32 | 4.68 | Kurtosis | 2.56 | 5.67 | 2.87 | ||
| Skewness | 2.00 | 2.12 | 2.02 | Skewness | 1.70 | 2.24 | 1.77 | ||
| SD | 14.54 | 12.42 | 14.14 | SD | 17.21 | 14.54 | 16.69 | ||
| Temperature (˚C) | Min | −0.10 | 4.60 | −0.10 | Temperature (˚C) | Min | −1.50 | −0.10 | −1.50 |
| Max | 41.20 | 36.60 | 41.20 | Max | 28.00 | 29.00 | 29.00 | ||
| Mean | 17.00 | 19.33 | 17.47 | Mean | 15.58 | 14.24 | 15.31 | ||
| Kurtosis | −0.20 | −1.07 | −0.50 | Kurtosis | −1.14 | −1.70 | −1.28 | ||
| Skewness | 0.37 | −0.15 | 0.26 | Skewness | −0.24 | 0.08 | −0.17 | ||
| SD | 9.10 | 9.83 | 9.25 | SD | 8.55 | 10.03 | 8.83 | ||
| River Discharge (m 3 /s) | Min | 0.00 | 0.00 | 0.00 | Irrigation Volume (MCM/month) | Min | 0.00 | 0.00 | 0.00 |
| Max | 49.08 | 53.07 | 53.07 | Max | 12.09 | 11.33 | 12.09 | ||
| Mean | 12.50 | 11.83 | 12.36 | Mean | 2.07 | 2.15 | 2.09 | ||
| Kurtosis | −0.65 | −0.03 | −0.58 | Kurtosis | 1.66 | 2.05 | 1.62 | ||
| Skewness | 0.94 | 1.08 | 0.96 | Skewness | 1.45 | 1.53 | 1.45 | ||
| SD | 16.23 | 16.35 | 16.18 | SD | 2.87 | 3.04 | 2.89 | ||
| Groundwater Abstraction (m) | Min | −3.53 | −2.33 | −3.53 | Groundwater Abstraction (m) | Min | −8.99 | −8.76 | −8.99 |
| Max | 3.40 | 3.49 | 3.49 | Max | 7.91 | 4.20 | 7.91 | ||
| Mean | −0.09 | −0.02 | −0.08 | Mean | −0.25 | −0.27 | −0.25 | ||
| Kurtosis | 1.84 | 1.90 | 1.76 | Kurtosis | 0.38 | 1.31 | 0.51 | ||
| Skewness | 0.09 | 1.05 | 0.31 | Skewness | −0.11 | −1.06 | −0.32 | ||
| SD | 1.17 | 1.29 | 1.19 | SD | 2.86 | 3.09 | 2.89 | ||
| Groundwater Level (m) | Min | 1600.08 | 1600.28 | 1600.08 | Groundwater Level (m) | Min | 1543.93 | 1547.62 | 1543.93 |
| Max | 1610.56 | 1610.29 | 1610.56 | Max | 1578.31 | 1575.30 | 1578.31 | ||
| Mean | 1603.91 | 1603.27 | 1603.78 | Mean | 1557.89 | 1559.36 | 1558.18 | ||
| Kurtosis | −0.27 | 0.60 | −0.20 | Kurtosis | −0.10 | −0.60 | −0.23 | ||
| Skewness | 0.47 | 1.06 | 0.57 | Skewness | 0.54 | 0.16 | 0.46 | ||
| SD | 2.55 | 2.98 | 2.64 | SD | 7.84 | 7.17 | 7.70 | ||
| Zone Z3 | Zone Z4 | ||||||||
| Precipitation (mm/month) | Min | 2.99 | 6.07 | 2.99 | Precipitation (mm/month) | Min | 4.00 | 2.99 | 2.99 |
| Max | 32.83 | 33.24 | 33.24 | Max | 33.24 | 32.27 | 33.24 | ||
| Mean | 18.20 | 22.24 | 19.01 | Mean | 19.27 | 17.92 | 19.01 | ||
| Kurtosis | −1.47 | −1.44 | −1.46 | Kurtosis | −1.49 | −1.24 | −1.46 | ||
| Skewness | 0.00 | −0.35 | −0.05 | Skewness | −0.11 | 0.17 | −0.05 | ||
| SD | 9.27 | 9.38 | 9.40 | SD | 9.53 | 8.95 | 9.40 | ||
| Temperature (˚C) | Min | 0.00 | 0.00 | 0.00 | Temperature (˚C) | Min | 0.00 | 0.00 | 0.00 |
| Max | 66.50 | 74.00 | 74.00 | Max | 74.00 | 66.50 | 74.00 | ||
| Mean | 10.73 | 8.99 | 10.39 | Mean | 9.13 | 15.48 | 10.39 | ||
| Kurtosis | 2.58 | 7.17 | 3.56 | Kurtosis | 5.20 | 0.82 | 3.56 | ||
| Skewness | 1.71 | 2.64 | 1.92 | Skewness | 2.18 | 1.26 | 1.92 | ||
| SD | 15.05 | 17.91 | 15.60 | SD | 14.42 | 19.17 | 15.60 | ||
| Irrigation Volume (MCM/month) | Min | 0.00 | 0.00 | 0.00 | Irrigation Volume (MCM/month) | Min | 0.00 | 0.00 | 0.00 |
| Max | 40.23 | 26.28 | 40.23 | Max | 6.81 | 9.97 | 9.97 | ||
| Mean | 7.77 | 7.12 | 7.64 | Mean | 1.02 | 1.39 | 1.09 | ||
| Kurtosis | 0.93 | −0.10 | 0.97 | Kurtosis | 4.01 | 9.03 | 9.42 | ||
| Skewness | 1.32 | 0.91 | 1.29 | Skewness | 1.69 | 2.62 | 2.41 | ||
| SD | 10.57 | 8.32 | 10.13 | SD | 1.32 | 2.20 | 1.53 | ||
| Groundwater Abstraction (m) | Min | −5.15 | −6.91 | −6.91 | Groundwater Abstraction (m) | Min | −0.95 | −0.28 | −0.95 |
| Max | 6.35 | 6.30 | 6.35 | Max | 0.44 | 0.48 | 0.48 | ||
| Mean | 0.05 | −0.21 | 0.00 | Mean | −0.01 | 0.00 | −0.01 | ||
| Kurtosis | 2.53 | 3.33 | 2.98 | Kurtosis | 6.42 | 2.60 | 6.10 | ||
| Skewness | 0.45 | −0.03 | 0.23 | Skewness | −1.66 | 1.28 | −1.27 | ||
| SD | 1.76 | 2.39 | 1.89 | SD | 0.18 | 0.16 | 0.18 | ||
| Groundwater Level (m) | Min | 1570.70 | 1571.14 | 1570.70 | Groundwater Level (m) | Min | 1732.14 | 1732.17 | 1732.14 |
| Max | 1580.99 | 1579.47 | 1580.99 | Max | 1733.42 | 1733.62 | 1733.62 | ||
| Mean | 1573.60 | 1573.78 | 1573.64 | Mean | 1732.83 | 1732.90 | 1732.84 | ||
| Kurtosis | −0.20 | 0.01 | −0.20 | Kurtosis | 0.13 | −0.24 | 0.29 | ||
| Skewness | 0.92 | 0.85 | 0.90 | Skewness | −0.15 | −0.08 | −0.01 | ||
| SD | 2.60 | 2.33 | 2.54 | SD | 0.23 | 0.36 | 0.26 | ||
| Zone Z5 | |||||||||
| Precipitation (mm/month) | Min | 0.00 | 0.00 | 0.00 | Temperature (˚C) | Min | 1.06 | 1.10 | 1.06 |
| Max | 77.70 | 67.00 | 77.70 | Max | 29.54 | 29.94 | 29.94 | ||
| Mean | 10.81 | 12.23 | 11.10 | Mean | 15.53 | 17.38 | 15.89 | ||
| Kurtosis | 3.95 | 3.09 | 3.57 | Kurtosis | −1.51 | −1.18 | −1.46 | ||
| Skewness | 2.01 | 1.67 | 1.92 | Skewness | 0.04 | −0.31 | −0.01 | ||
| SD | 16.48 | 16.94 | 16.51 | SD | 9.20 | 9.58 | 9.27 | ||
| Groundwater Level (m) | Min | 1838.19 | 1838.26 | 1838.19 | Kurtosis | −1.01 | −1.23 | −0.92 | |
| Max | 1840.18 | 1840.24 | 1840.24 | Skewness | 0.23 | 0.13 | 0.28 | ||
| Mean | 1838.92 | 1839.09 | 1838.95 | SD | 0.50 | 0.65 | 0.53 | ||
Exploratory data analysis of groundwater level fluctuations
To gain deeper insights into the temporal behavior of groundwater levels across the Najafabad Plain, an exploratory data analysis (EDA) was conducted using violin plots. These plots effectively combine boxplots and kernel density estimates to depict the distribution and variability of groundwater levels on both monthly and seasonal scales, across the five delineated observation zones (Z1 to Z5). The monthly and seasonal distributions are presented in Figs. 2 and 3, respectively.
Fig. 2.
Monthly distribution of groundwater level fluctuations (m) across five observation zones (Z1 to Z5) in the Najafabad Plain.
Fig. 3.
Seasonal violin plots showing the distribution of groundwater levels (m) across five hydrogeological zones (Z1–Z5) in the Najafabad Plain.
The monthly violin plots (Fig. 2) reveal distinct fluctuation patterns among the zones. Zone 1 (Z1), located near the river corridor, exhibits pronounced variability during both summer and winter months, as indicated by the wider spread of values. Zone 2 (Z2) shows a gradual rise in groundwater levels from April to December, along with considerable dispersion during the transitional months, potentially due to irrigation cycles. Zone 3 (Z3) maintains relatively stable levels, with the most notable fluctuations occurring between June and October—likely reflecting seasonal agricultural demand. In Zone 4 (Z4), groundwater levels remain consistent throughout the year except for December, which displays a sharp increase in variability. Zone 5 (Z5), located in the western highlands, demonstrates a gradual rise in groundwater levels beginning in June and peaking around October, possibly due to delayed recharge from precipitation or irrigation return flows.
Seasonal violin plots (Fig. 3) provide a comprehensive visualization of the temporal dynamics of groundwater levels across the five hydrogeological zones. Zones Z1, Z2, and Z3 exhibit wider distributions during spring and summer, indicating higher variability likely associated with intensified evapotranspiration, peak agricultural water demand, and potential seasonal recharge events. In contrast, Z4 and Z5 display comparatively narrower and more stable distributions across most seasons, with only a modest increase in variability observed during fall and winter.
Median groundwater levels tend to be elevated in spring and summer, particularly in Z1 and Z2, suggesting the combined effects of irrigation inputs and seasonal recharge from precipitation or upstream inflows. These patterns highlight the spatial heterogeneity in seasonal groundwater behavior, underscoring the greater sensitivity of some zones to environmental and anthropogenic drivers. Such insights are valuable for guiding zone-specific feature engineering, optimizing model calibration, and improving the robustness of predictive groundwater models.
Groundwater level modeling approach
Machine learning algorithms
In this study, three state-of-the-art supervised machine learning algorithms—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM)—were implemented to model and forecast groundwater level variations across the Najafabad aquifer. These algorithms are widely recognized for their robustness in handling nonlinear relationships, high-dimensional input spaces, and missing or noisy data, making them well-suited for hydrological applications16,25. They were specifically selected to represent three distinct modeling paradigms—ensemble bagging (RF), ensemble boosting (XGBoost), and kernel-based learning (SVM)—allowing a robust comparative assessment. This selection also reflects their proven capability in previous groundwater studies to perform effectively under heterogeneous aquifer conditions and limited-resolution anthropogenic datasets.
Random Forest (RF):
This ensemble learning method is based on the bagging (bootstrap aggregating) technique. Multiple regression trees are constructed using bootstrapped samples of the training data. Each tree hi(x) makes a prediction, and the final RF output is the average of predictions from all trees:
![]() |
1 |
RF minimizes the mean squared error (MSE) across trees and uses random subsets of features at each split to reduce correlation among trees. The objective function focuses on reducing variance while maintaining low bias. Feature importance is computed based on the mean decrease in impurity or prediction accuracy across all trees. In this study, the RF model was implemented using the scikit-learn library with hyperparameter tuning (e.g., number of trees, max depth) via grid search26,27.
Gradient Boosting (GB):
Gradient Boosting builds models sequentially by fitting each new model to the residual errors of the ensemble prediction so far. The objective function combines a loss function
(e.g., squared error loss) and a regularization term
to penalize model complexity:
![]() |
2 |
In this study, the eXtreme Gradient Boosting (XGBoost) version was used, which improves efficiency and regularization. XGBoost uses second-order Taylor approximation of the loss function for optimization and supports shrinkage (learning rate) and column subsampling to reduce overfitting. The additive model is updated as:
![]() |
3 |
where is the learning rate. The model was implemented using the XGBoost library with hyperparameter tuning on parameters such as the number of estimators, learning rate, and maximum tree depth28,29.
Support Vector Machine (SVM):
Support Vector Regression (SVR) aims to find a function that approximates the underlying relationship between input features and target values while balancing model complexity and prediction accuracy. The SVR function is expressed as:
![]() |
4 |
where: () maps the input vector into a high-dimensional feature space, and and are the model parameters.
The SVR formulation attempts to minimize the model complexity (by keeping ∥∥ small) while ensuring that the predictions are within an ε-insensitive margin. The optimization problem is defined as:
![]() |
5 |
Subject to:
![]() |
6 |
where: is the regularization parameter that controls the trade-off between the flatness of the function and the amount up to which deviations larger than ε are tolerated.
are slack variables that allow violations of the ε margin.
To model nonlinear relationships, a kernel function
is employed. In this study, the Radial Basis Function (RBF) kernel was used:
![]() |
7 |
The RBF kernel enables the model to learn complex, nonlinear patterns in the data.
The SVR model was implemented using the scikit-learn library in Python. The key hyperparameters— (penalty), (margin width), and (kernel coefficient)—were optimized through a grid search approach combined with cross-validation to ensure robust performance30,31.
All models were trained using the historical time series of meteorological variables (e.g., precipitation, temperature), surface water inflow, and groundwater levels across five hydrogeological zones (Sect. "Input variables per zone"). A combination of grid search and five-fold cross-validation was applied to fine-tune model hyperparameters and ensure robust generalization. Model performance was evaluated using standard statistical metrics, as detailed in Sect. "Model evaluation metrics".
The complete implementation codes, including data preprocessing, model training, and evaluation scripts, are provided in Supplementary Material (S2) for reference and reproducibility.
Input variables per zone
To account for the spatial heterogeneity of hydrological and anthropogenic influences across the Najafabad aquifer, groundwater level modeling was conducted separately for five hydrogeological distinct zones (Z1–Z5). This zonal approach allows for a more accurate representation of local dynamics by tailoring model inputs to the specific conditions of each area.
The selection of input variables in each zone was guided by both data availability and their relevance to groundwater level fluctuations. The monthly change in groundwater level (GwL) was considered the output variable across all zones, while the predictor variables included monthly precipitation (P), average temperature (T), river discharge (RD), irrigation volume (IV), and groundwater abstraction (GwA). Table 4 provides an overview of the input variables used in each zone. The inclusion or exclusion of specific variables was influenced by each zone’s proximity to surface water resources, irrigation infrastructure, and elevation. For example, zones located downstream of the Nekouabad canal included irrigation volume as a model input, while zones without reliable abstraction data excluded that variable.
Table 4.
Input variables for groundwater level change modeling in the defined Zones.
| Zone | Inputs | Output | |||||
|---|---|---|---|---|---|---|---|
| Meteorological Station | Precipitation (P) | Temperature (T) | River Discharge (RD) | Irrigation Volume (IV) | Groundwater Abstraction (GwA) | Groundwater Level (GwL) | |
| Z1 | Zefreh | ✓ | ✓ | ✓ | - | ✓ | ✓ |
| Z2 | Zefreh | ✓ | ✓ | - | ✓ | ✓ | ✓ |
| Z3 | Zefreh | ✓ | ✓ | - | ✓ | ✓ | ✓ |
| Z4 | Najafabad | ✓ | ✓ | - | ✓ | ✓ | ✓ |
| Z5 | Najafabad | ✓ | ✓ | - | - | - | ✓ |
✓: Variable included as model input, –: Variable excluded due to unavailability or irrelevance.
To validate the selection of input variables and to explore the interdependencies among climatic, hydrological, and anthropogenic drivers, Pearson correlation heatmaps were generated for each hydrogeological zone (Fig. 4). These heatmaps quantitatively depict the strength and direction of linear associations between groundwater levels (and their decline) and the predictor variables.
Fig. 4.
Pearson correlation heatmaps between groundwater level, groundwater level decline and input variables across hydrogeological zones Z1 to Z5.
The results highlight substantial spatial variability across the zones. For instance, Zone Z1 displayed a moderate positive correlation between groundwater level decline and abstraction (r = 0.44), confirming the strong impact of pumping activities. In Zone Z2, a distinct negative correlation between temperature and groundwater levels (r = − 0.51) indicated the dominant role of evapotranspiration in intensifying depletion. Zone Z3 demonstrated a mixed influence of both climatic (precipitation and temperature) and anthropogenic (irrigation and abstraction) factors, reflecting the complex hydro–climatic setting. In comparison, Zones Z4 and Z5 exhibited weaker correlations overall, implying relatively stable conditions or the presence of additional drivers not captured in the available datasets.
This correlation-based diagnostic analysis is not only scientifically relevant but also essential for justifying the zonal modeling framework. By identifying zone-specific dominant drivers, the heatmaps strengthen the rationale for input selection and demonstrate the necessity of considering both climatic and human-induced factors when predicting groundwater dynamics in the Najafabad Plain.
Model training and validation strategy
To ensure robust and generalizable model performance, the available dataset for each hydrogeological zone was split into training and testing subsets. 70% (70%) of the data was used for training the machine learning models, while the remaining 30% was reserved for testing and evaluation. The time series were divided chronologically to preserve the temporal structure and prevent data leakage. To fine-tune the model parameters and enhance prediction accuracy, a five-fold cross-validation (CV) strategy was employed during the training phase. In this method, the training data was partitioned into five equal subsets, and each subset was used once for validation while the remaining four subsets were used for training. The final model performance was averaged across the five folds, ensuring a balanced evaluation of bias and variance.
Hyperparameter optimization was performed for each algorithm using a grid search approach, where multiple combinations of key parameters were systematically tested. The optimal hyperparameter set was selected based on minimizing the validation error, specifically the Root Mean Square Error (RMSE). To further prevent overfitting, each model incorporated algorithm-specific regularization mechanisms. For example, the Random Forest model utilized a maximum tree depth constraint and minimum sample split size, the Gradient Boosting model implemented learning rate and subsampling adjustments, and the Support Vector Machine model optimized the kernel function and regularization coefficient (C).
This training-validation strategy ensured that the models were both accurate and resilient to noise and overfitting, and it enabled fair comparisons between algorithms across different zones of the Najafabad Plain.
Model evaluation metrics
To evaluate the predictive accuracy of the developed machine learning models, five widely used performance metrics were employed: Coefficient of Determination (R²), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Willmott’s Index of Agreement (WI), and Percent Bias (PBIAS). These metrics provide complementary insights into model performance in terms of accuracy, bias, agreement, and error distribution.
Coefficient of Determination (R²) assesses how well the predicted values replicate the observed values. It is defined as:
![]() |
8 |
Root Mean Square Error (RMSE) measures the square root of the average squared differences between predicted and observed values:
![]() |
9 |
Mean Absolute Error (MAE) quantifies the average magnitude of absolute differences between predictions and actual observations:
![]() |
10 |
The Willmott’s Index of Agreement (WI) evaluates the degree to which predictions match observations:
![]() |
11 |
Values range from 0 (no agreement) to 1 (perfect agreement).
The Percent Bias (PBIAS) measures the average tendency of predictions to overestimate or underestimate observations:
![]() |
12 |
Positive values indicate a model underestimates observations, while negative values indicate overestimation.
Where and are the observed and predicted values, respectively, and
is the mean of the observed values. In summary, higher R² and WI values closer to 1, together with lower RMSE, MAE, and ∣PBIAS∣, indicate better model performance. ∣PBIAS∣ below 10–15% is generally considered satisfactory in hydrological applications.
These metrics were computed for each model (RF, GB, SVM) and each hydrogeological zone to enable detailed comparison and identify the best-performing algorithm in each context. The selected metrics have been widely applied in hydrological and groundwater modeling studies32–36.
For a detailed overview of the methodological framework including data preprocessing, zone delineation, model training, and evaluation steps, refer to the Methodology Flowchart provided in Supplementary Material (S1).
Results and discussion
This section presents the outcomes of the groundwater level modeling across the five defined zones of the Najafabad aquifer using three machine learning algorithms: Random Forest (RF), XGBoost (GB), and Support Vector Machine (SVM). The results are analyzed in terms of model accuracy, spatial variability, and the relative importance of input variables. Furthermore, the performance of the models is compared with findings from previous studies, and implications for regional groundwater management are discussed. The aim is to provide an in-depth understanding of how different factors affect groundwater level fluctuations and how data-driven models can support sustainable water resource planning in semi-arid regions.
Model performance per zone
The performance of the three machine learning models—XGBoost, Random Forest (RF), and Support Vector Machine (SVM)—was evaluated across five hydrogeological zones using RMSE, MAE, R², Willmott’s Index of Agreement (WI), and Percent Bias (PBIAS) during both training and testing phases (Table 5). The scatter plots of predicted versus observed groundwater levels for each model and zone are illustrated in Fig. 5, visually reinforcing the numerical results.
Table 5.
Performance metrics (RMSE, MAE, R²) of XGBoost, RF, and SVM models across hydrogeological zones Z1–Z5 during training and testing.
| Model | Index | Training | Testing | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Z1 | Z2 | Z3 | Z4 | Z5 | Z1 | Z2 | Z3 | Z4 | Z5 | ||
| XGBoost | RMSE | 0.82 | 0.81 | 1.64 | 1.09 | 0.35 | 0.85 | 2.37 | 2.87 | 1.25 | 0.43 |
| R 2 | 0.93 | 0.95 | 0.94 | 0.90 | 0.95 | 0.91 | 0.80 | 0.85 | 0.76 | 0.92 | |
| MAE | 0.21 | 1.01 | 1.5 | 0.38 | 0.15 | 0.32 | 1.8 | 2.4 | 0.67 | 0.21 | |
| WI | 0.98 | 0.99 | 0.97 | 0.96 | 0.99 | 0.97 | 0.92 | 0.94 | 0.93 | 0.98 | |
| PBIAS (%) | −0.8 | 1.2 | −1.5 | 0.6 | −0.5 | −1.1 | 2.4 | 1.8 | −1.7 | −1.2 | |
| RF | RMSE | 0.74 | 1.43 | 2.14 | 1.08 | 0.68 | 0.77 | 2.77 | 2.93 | 1.23 | 0.83 |
| R 2 | 0.82 | 0.88 | 0.86 | 0.70 | 0.9 | 0.78 | 0.75 | 0.77 | 0.59 | 0.85 | |
| MAE | 0.42 | 1 | 1.7 | 0.41 | 0.2 | 0.72 | 1.5 | 2.8 | 0.66 | 0.48 | |
| WI | 0.96 | 0.94 | 0.92 | 0.91 | 0.95 | 0.94 | 0.89 | 0.90 | 0.87 | 0.95 | |
| PBIAS (%) | 1.5 | 3 | 2.8 | −2.4 | 1.1 | 2.8 | 4.5 | 5.1 | −3.2 | 1.7 | |
| SVM | RMSE | 0.79 | 1.23 | 2.34 | 1.13 | 0.23 | 0.87 | 2.05 | 4.16 | 1.26 | 0.44 |
| R 2 | 0.91 | 0.94 | 0.92 | 0.85 | 0. 97 | 0.86 | 0.82 | 0.80 | 0.70 | 0.93 | |
| MAE | 0.40 | 1.4 | 2 | 0.51 | 0.12 | 0.51 | 2.5 | 3.2 | 0.75 | 0.19 | |
| WI | 0.97 | 0.96 | 0.91 | 0.93 | 0.99 | 0.96 | 0.91 | 0.86 | 0.90 | 0.98 | |
| PBIAS (%) | −1.2 | 2.1 | 3.4 | −1.8 | 0.4 | −0.7 | 3.7 | 7.9 | −2.5 | 0.8 | |
Fig. 5.

Comparison of observed vs. predicted groundwater levels for SVM, RF, and XGBoost models across zones Z1–Z5, with the 1:1 line indicating perfect agreement.
A critical analysis of model performance across hydrogeological zones Z1–Z5, as illustrated in Fig. 5, reveals notable differences in prediction accuracy, bias tendency, and generalization ability among the tested algorithms. Overall, XGBoost outperformed the other models in most zones, demonstrating both accuracy and stability. It achieved the highest R² values and WI scores during testing in Z1 (R² = 0.91, WI = 0.97), Z3 (R² = 0.85, WI = 0.94), Z4 (R² = 0.76, WI = 0.93), and Z5 (R² = 0.92, WI = 0.98), along with low PBIAS values within ± 2%, indicating minimal systematic bias. For instance, in Z5, XGBoost yielded a remarkably low RMSE of 0.43 m and MAE of 0.21 m, while maintaining PBIAS = − 1.2%, suggesting excellent generalization and prediction capability. These results are in agreement with the findings of Ibrahem Ahmed Osman, et al21., who also reported outstanding performance of XGBoost in groundwater level forecasting tasks.
The SVM model, while slightly more variable, performed surprisingly well in Z5 (R² = 0.93, WI = 0.98, RMSE = 0.44 m, PBIAS = + 0.8%), and also showed competitive accuracy in Z1 and Z2. However, its performance dropped significantly in Z3, where it recorded the highest RMSE (4.16 m), lowest WI (0.86), and largest PBIAS (+ 7.9%) among all models and zones during testing, highlighting a tendency for systematic overestimation in that zone. Random Forest (RF) offered moderate results across the zones, with its best performance in Z5 (R² = 0.85, WI = 0.95, RMSE = 0.83 m, PBIAS = + 1.7%). However, in Z2 and Z3, its testing RMSE exceeded 2.7 m, WI dropped below 0.90, and PBIAS exceeded + 5%, indicating a more noticeable bias. These findings are consistent with those reported by Saleh and Rasel37, who identified SVM as a reliable model for groundwater prediction tasks compared to ensemble-based methods such as Random Forest in certain hydrogeological settings.
A notable observation is that Zone 5 (Z5), despite its high elevation and potentially complex hydrogeological features, was modeled more successfully than Zones 2 and 3. All models achieved R² values above 0.85 and WI values above 0.95 in Z5 during testing, with particularly low MAEs and RMSEs, suggesting more consistent and learnable patterns in the groundwater data of this zone. In contrast, Z3 posed a challenge for all models—particularly for SVM—likely due to higher variability or the presence of outliers. This is evident both in statistical metrics and in the wider spread of data points away from the 1:1 line in Fig. 5.
In summary, XGBoost demonstrated the most balanced and robust performance across all zones, with consistently high WI scores and low PBIAS, while SVM and RF showed strong localized performance, especially in zones with less variability. These findings emphasize the importance of considering local hydrogeological characteristics in selecting the optimal modeling approach for groundwater prediction.
Importance of input variables
To assess the influence of different input variables on groundwater level prediction, the relative importance of each predictor was evaluated using Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Figure 6 presents the comparative importance values across the five hydrogeological zones (Z1–Z5).
Fig. 6.
Relative Importance of Input Variables Across Zones (Z1–Z5).
Across all zones, precipitation (P) and temperature (T) consistently emerged as the dominant predictors, although their relative influence varied. In Zone 1, temperature slightly outweighed precipitation under RF and XGBoost, while SVM assigned nearly equal weight to recharge depth (RD) and temperature. Zone 2 showed a strong dominance of temperature (> 65%) in SVM, whereas RF and XGBoost indicated a more balanced contribution from both climatic variables. Zone 3 followed a similar pattern, with temperature prevailing across all models, while groundwater abstraction (GwA) contributed minimally. In Zone 4, XGBoost emphasized temperature (~ 47%), followed by precipitation and recharge depth, whereas RF and SVM gave higher weight to precipitation. Zone 5, which only included precipitation and temperature, exhibited the clearest consensus, with temperature consistently ranked as the most influential predictor (close to or above 65%).
The variation in variable importance was also reflected in model accuracy. Zones dominated by temperature and precipitation (e.g., Z1, Z3, Z5) achieved higher predictive performance, as these factors showed stronger correlations with groundwater fluctuations. By contrast, zones where anthropogenic variables such as irrigation volume (IV) or groundwater abstraction (GwA) had weaker correlations or coarser resolution (e.g., Z3 and Z4) exhibited relatively lower accuracy, particularly for SVM. Ensemble methods (XGBoost and RF) were more effective in capturing the combined effect of dominant climatic variables, explaining their superior accuracy compared to SVM in most zones. Overall, these results confirm that model performance is closely tied to the strength of dominant predictors, with climatic variables—precipitation and temperature—providing the greatest predictive power in arid and semi-arid aquifer systems. This finding is consistent with previous studies (e.g., Nourani, et al38. and Costantini, et al39.), which reported that the accuracy of machine learning models in groundwater forecasting is largely determined by the dominance of hydro-climatic drivers relative to lower-resolution anthropogenic data.
To enhance the practical relevance of these findings, we translated them into zone-specific management implications. For instance, in Zone 2, where temperature emerged as the most critical predictor, promoting water-saving irrigation practices during peak summer months could reduce excessive drawdown and improve recharge efficiency. In Zone 1, where groundwater abstraction showed greater influence, stricter pumping regulations and enhanced monitoring would help mitigate aquifer stress. These examples illustrate how feature importance analysis can directly inform adaptive groundwater management in semi-arid regions. Although this study relied on conventional feature importance rankings, advanced interpretability techniques such as SHAP and LIME can provide more transparent explanations of model behavior. Future research could incorporate these approaches to strengthen model explain ability and decision-making. Recent studies (Elzain, et al. 13,Eldin Elzain, et al. 40) have demonstrated the potential of explainable machine learning in hydrological applications.
In conclusion, this comparative analysis confirms the robustness of ensemble learning methods—particularly XGBoost—in predicting groundwater levels under heterogeneous hydrogeological conditions. It also underscores the importance of zone-based calibration and sensitivity analysis to account for local hydro-climatic variability and support the development of context-appropriate forecasting tools.
Model robustness and generalization
Robustness and generalization are critical attributes of any predictive model, particularly in groundwater studies where data quality, completeness, and variability can significantly influence model outcomes. In the present study, all three machine learning models—Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost)—were evaluated across five hydrogeological zones with varying characteristics to assess their stability and transferability (Table 6).
Table 6.
Comparison of model performance metrics in training and testing phases across all zones.
| Model | Phase | Mean R² | Mean RMSE | ΔR² (Train - Test) | ΔRMSE (Test - Train) |
|---|---|---|---|---|---|
| XGBoost | Training | 0.934 | 0.942 | 0.086 | 0.612 |
| Testing | 0.848 | 1.554 | |||
| RF | Training | 0.832 | 1.214 | 0.084 | 0.492 |
| Testing | 0.748 | 1.706 | |||
| SVM | Training | 0.918 | 1.144 | 0.096 | 0.612 |
| Testing | 0.822 | 1.756 |
Among the models, XGBoost demonstrated the highest predictive performance during both training (mean R² = 0.934, RMSE = 0.942) and testing (mean R² = 0.848, RMSE = 1.554), with a moderate drop in performance (ΔR² = 0.086, ΔRMSE = 0.612). This relatively small gap highlights XGBoost’s robust generalization ability, especially considering its ability to capture nonlinear groundwater dynamics in both stable (e.g., Z1, Z5) and hydrologically complex regions (e.g., Z3). This robustness can be attributed to its embedded regularization mechanisms, capacity to manage multicollinearity, and iterative bias-variance reduction during training. The Random Forest (RF) model achieved lower overall predictive accuracy (R² = 0.832 in training and 0.748 in testing) but exhibited the smallest RMSE difference (ΔRMSE = 0.492), indicating good stability and a low tendency to overfit. While RF may underperform in zones with sharp hydrological fluctuations, it is relatively resilient to noisy or incomplete data, making it useful for preliminary analysis or areas with limited observations.
It is noteworthy that although SVM achieved high performance during the training phase (R² = 0.9180), it exhibited the largest drop in testing accuracy (R² = 0.8220), suggesting susceptibility to overfitting. This limitation may stem from insufficient regularization or the kernel function choice. Future studies could explore strategies such as stronger regularization, improved feature normalization, or the adoption of alternative kernels to enhance model generalization. Moreover, advanced variants like ν-Support Vector Regression (ν-SVR) have demonstrated improved stability and predictive performance in hydrological modeling41,42, and could be considered to further strengthen groundwater forecasting applications.
These comparative results reinforce the importance of selecting models that balance accuracy with generalization, especially in regions such as the Najafabad plain where hydrogeological heterogeneity and climatic variability present significant modeling challenges. Among the three models, XGBoost emerged as the most robust and generalizable, offering both predictive strength and stability across diverse zones. Future studies should consider hybrid modeling strategies that integrate multiple machine learning models or external data sources (e.g., remote sensing, irrigation patterns, or socio-economic data) to enhance predictive robustness and transferability in similar complex groundwater systems.
Implications for groundwater management and future directions
The findings of this study have important implications for groundwater management in data-scarce and hydrogeological diverse regions such as the Najafabad plain. By evaluating the performance of three machine learning algorithms—XGBoost, Random Forest (RF), and Support Vector Machine (SVM)—across five distinct zones, the study demonstrates the potential of data-driven approaches to enhance decision-making under conditions of uncertainty and limited observations.
Among the models, XGBoost showed the strongest overall performance, particularly in terms of generalization and robustness, making it well-suited for capturing complex and nonlinear groundwater dynamics. RF provided relatively stable predictions across most zones but underperformed in areas with highly variable groundwater behavior due to its limited capacity to capture nonlinear interactions. SVM achieved high training accuracy; however, it exhibited a larger generalization gap and higher sensitivity to data scaling and kernel selection, especially in hydrogeological complex zones. These results align with previous studies in semi-arid regions (e.g., Elzain, et al. 13,Eldin Elzain, et al. 40), which also highlighted the advantages of ensemble learning methods over single models for groundwater forecasting under heterogeneous conditions.
The ability to quantify feature importance in XGBoost offers valuable insights into the key drivers of groundwater fluctuations, such as precipitation and temperature, supporting the design of adaptive climate-resilient management strategies. Zone-specific variability in model accuracy underscores the need for localized modeling frameworks. Complex zones like Z3 require tailored calibration and potentially the integration of additional hydrological or anthropogenic indicators (e.g., irrigation, pumping intensity) to improve predictive reliability.
Several limitations must be acknowledged. First, predictive accuracy is constrained by the availability and resolution of input data. Limited high-resolution spatiotemporal data on irrigation and pumping likely influenced performance in zones with intensive human activity. Second, while the models demonstrated strong predictive ability, they relied primarily on raw input variables. Exclusion of derived features, such as lagged or cumulative indicators, may have restricted the models’ capacity to fully capture delayed recharge and abstraction dynamics. Finally, machine learning models remain “black-box” systems, limiting causal interpretability and policy adoption. Future research should explore hybrid frameworks combining data-driven and physically-based models, integration of remotely sensed and socio-economic data, and dynamic modeling incorporating land-use and climate change feedbacks. Overall, this study confirms the potential of ensemble machine learning models—particularly XGBoost—for operational groundwater management in semi-arid environments, and highlights pathways to further improve reliability and interpretability for strategic planning.
Conclusion
This study assessed the predictive performance of three machine learning algorithms—Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM)—for forecasting groundwater level fluctuations across five hydrogeological zones in the Najafabad plain, central Iran. The results revealed that XGBoost provided the highest overall accuracy, with testing performance (R² ≈ 0.85) surpassing RF and SVM, and demonstrated greater robustness in capturing the nonlinear and heterogeneous dynamics of groundwater levels. RF produced relatively stable predictions with moderate accuracy, while SVM showed high sensitivity to data characteristics and weaker generalization in more variable zones.
These findings emphasize that ensemble learning techniques, particularly XGBoost, offer a practical and scalable solution for groundwater monitoring in semi-arid environments. The application of such models can support early warning systems, resource allocation, and long-term groundwater management under increasing climatic and anthropogenic stress. Nonetheless, model performance remains constrained by limited availability of high-resolution anthropogenic data such as pumping and irrigation rates, and by the “black-box” nature of machine learning, which hinders interpretability and broader policy uptake. Future research should focus on hybrid approaches that combine data-driven and physically based models, integration of remote sensing and socio-economic datasets, and dynamic simulations that account for climate and land-use changes to enhance reliability and decision-making capacity.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
The authors thank the anonymous reviewers for their valuable comments and suggestions.
Author contributions
All authors contributed to the conceptualization and design of the study. Material preparation, data collection, and analysis were performed by S.D., S.E., and M.J. Code development was carried out by M.J., and S.D. The first draft of the manuscript was written by S.E., and M.J. Review and editing were conducted by S.E., and H.R.S. All authors read and approved the final manuscript and provided comments on previous versions.
Data availability
The dataset generated and/or analyzed during the current study is available from the corresponding author upon reasonable request. The Python codes used for model development and evaluation are publicly available at the following GitHub repository: https://github.com/mohammadj74/Groundwater-ML-Najafabad.
Declarations
Competing interests
The authors declare no competing interests.
Ethics approval
This study is original research and has not been previously published or submitted elsewhere. The research does not involve any experiments on animals or human subjects.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Jamali, M. & Eslamian, S. In Handbook of Climate Change Impacts on River Basin Management 225–236 (CRC, 2024).
- 2.Famiglietti, J. S. The global groundwater crisis. Nat. Clim. Change. 4, 945–948. 10.1038/nclimate2425 (2014). [Google Scholar]
- 3.Jamali, M., Yazdian, H., Bahman, G. & Eslamian, S. Water agriculture nexus a system dynamics approach for the next three decades. Sci. Rep.15, 5946. 10.1038/s41598-025-90728-3 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wada, Y. et al. Global depletion of groundwater resources. Geophys. Res. Lett.3710.1029/2010GL044571 (2010).
- 5.Jamali, M. & Eslamian, S. in Handbook of Hydroinformatics (eds Saeid Eslamian & Faezeh Eslamian) 223–237 (Elsevier, 2023). 10.1016/B978-0-12-821961-4.00010-5
- 6.Harbaugh, A. W. MODFLOW-2005, the US Geological Survey Modular ground-water Model: the ground-water Flow Process Vol. 6 (US Department of the Interior, US Geological Survey Reston, VA, 2005).
- 7.Jahanshahi, A. et al. Dependence of rainfall-runoff model transferability on climate conditions in Iran. Hydrol. Sci. J.67, 564–587. 10.1080/02626667.2022.2030867 (2022). [Google Scholar]
- 8.Xu, T. & Liang, F. Machine learning for hydrologic sciences: an introductory overview. WIREs Water. 8, e1533. 10.1002/wat2.1533 (2021). [Google Scholar]
- 9.Lange, H. & Sippel, S. in Forest-Water Interactions (ed Delphis, F.) Levia 233–257 (Springer International Publishing, (2020).
- 10.Papacharalampous, G. & Tyralis, H. A review of machine learning concepts and methods for addressing challenges in probabilistic hydrological post-processing and forecasting. Front. Water. 4–2022. 10.3389/frwa.2022.961954 (2022).
- 11.Eslamian, S. & Eslamian, F. Handbook of Hydroinformatics: Volume I: Classic soft-computing Techniques (Elsevier, 2022).
- 12.Ali, A. M., Abdallah, M., Mohammadi, B. & Elzain, H. E. Three-stage hybrid modeling for real-time streamflow prediction in data-scarce regions. J. Hydrology: Reg. Stud.59, 102337. 10.1016/j.ejrh.2025.102337 (2025). [Google Scholar]
- 13.Elzain, H. E. et al. Innovative approach for predicting daily reference evapotranspiration using improved shallow and deep learning models in a coastal region: A comparative study. J. Environ. Manage.354, 120246. 10.1016/j.jenvman.2024.120246 (2024). [DOI] [PubMed] [Google Scholar]
- 14.Tao, H. et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing489, 271–308. 10.1016/j.neucom.2022.03.014 (2022). [Google Scholar]
- 15.Sahoo, S., Russo, T. A., Elliott, J. & Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S. Water Resour. Res.53, 3878–3895. 10.1002/2016WR019933 (2017). [Google Scholar]
- 16.Pham, Q. B. et al. Groundwater level prediction using machine learning algorithms in a drought-prone area. Neural Comput. Appl.34, 10751–10773. 10.1007/s00521-022-07009-7 (2022). [Google Scholar]
- 17.Raturi, M., Khare, D. & Patidar, N. in In Applications of Machine Learning in Hydroclimatology. 57–71 (eds Srivastav, R., Purna, C. & Nayak) (Springer Nature Switzerland, 2025).
- 18.De La Noval, A. J., Upadhyay, H., Lagos, L., Soni, J. & Prabakar, N. Spatial-temporal analysis of groundwater well features from neural network prediction of hexavalent chromium concentration. Sci. Rep.14, 31070. 10.1038/s41598-024-82297-8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vafadar, S., Rahimzadegan, M. & Asadi, R. Evaluating the performance of machine learning methods and geographic information system (GIS) in identifying groundwater potential zones in Tehran-Karaj plain, Iran. J. Hydrol.624, 129952. 10.1016/j.jhydrol.2023.129952 (2023). [Google Scholar]
- 20.Zarafshan, P. et al. Comparison of machine learning models for predicting groundwater level, case study: Najafabad region. Acta Geophys.71, 1817–1830. 10.1007/s11600-022-00948-8 (2023). [Google Scholar]
- 21.Ibrahem Ahmed Osman, A., Najah Ahmed, A., Chow, M. F., Feng Huang, Y. & El-Shafie, A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng. J.12, 1545–1556. 10.1016/j.asej.2020.11.011 (2021). [Google Scholar]
- 22.Naderi, M. & Hajiketabi, M. Quantification of normal and sustainable management practices for groundwater resources: example of the arid Najafabad alluvial aquifer in Isfahan Province, Iran. Hydrogeol. J.31, 195–218. 10.1007/s10040-023-02596-8 (2023). [Google Scholar]
- 23.Organization, I. M. Iran Meteorological Organization, < (2022). https://www.irimo.ir/eng/index.php
- 24.Esfahan, R. W. C. o. (2022). https://www.esrw.ir/?l=EN
- 25.Karimi, H. et al. Enhancing groundwater quality prediction through ensemble machine learning techniques. Environ. Monit. Assess.19710.1007/s10661-024-13506-0 (2024). [DOI] [PubMed]
- 26.Breiman, L. & Random Forests Mach. Learn.45, 5–32, doi:10.1023/A:1010933404324 (2001). [Google Scholar]
- 27.Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res.12, 2825–2830 (2011). [Google Scholar]
- 28.Chen, T. & Guestrin, C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794Association for Computing Machinery, San Francisco, California, USA, (2016).
- 29.Kashani, A. & Safavi, H. R. Assessing groundwater drought in Iran using GRACE data and machine learning. Sci. Rep.15, 14671. 10.1038/s41598-025-99342-9 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vapnik, V. The Nature of Statistical Learning Theory (Springer science & business media, 2013).
- 31.Smola, A. J. & Schölkopf, B. A tutorial on support vector regression. Stat. Comput.14, 199–222. 10.1023/B:STCO.0000035301.49549.88 (2004). [Google Scholar]
- 32.Yadav, A., Raj, A. & Yadav, B. Enhancing local-scale groundwater quality predictions using advanced machine learning approaches. J. Environ. Manage.370, 122903. 10.1016/j.jenvman.2024.122903 (2024). [DOI] [PubMed] [Google Scholar]
- 33.Thakur, S. & Karmakar, S. A. Comparative analysis of ANN, LSTM and hybrid PSO-LSTM algorithms for groundwater level prediction. Trans. Indian Natl. Acad. Eng.10, 101–108. 10.1007/s41403-024-00505-3 (2025). [Google Scholar]
- 34.Simsek, O., Citakoglu, H., Gumus, V. & Dere Çetin, S. Applying machine learning to understand Rainfall–Runoff interactions in the Tigris river basin of Turkey. Pure. appl. Geophys.10.1007/s00024-025-03749-4 (2025). [Google Scholar]
- 35.Samantaray, S. & Sahoo, A. Groundwater level prediction using an improved ELM model integrated with hybrid particle swarm optimisation and grey Wolf optimisation. Groundw. Sustainable Dev.26, 101178. 10.1016/j.gsd.2024.101178 (2024). [Google Scholar]
- 36.Ritushree, B., Panda, S., Sahoo, A., Samantaray, S. & Satapathy, D. P. Prediction of groundwater level and potential zone identification in Keonjhar, Odisha based on machine learning and GIS techniques. Frankl. Open.11, 100250. 10.1016/j.fraope.2025.100250 (2025). [Google Scholar]
- 37.Saleh, M. A. & Rasel, H. M. Machine learning for groundwater levels: Uncovering the best predictors. Sustainable Water Resour. Manage.10, 166. 10.1007/s40899-024-01146-8 (2024). [Google Scholar]
- 38.Nourani, V., Ghareh Tapeh, A. H., Khodkar, K. & Huang, J. J. Assessing long-term climate change impact on Spatiotemporal changes of groundwater level using autoregressive-based and ensemble machine learning models. J. Environ. Manage.336, 117653. 10.1016/j.jenvman.2023.117653 (2023). [DOI] [PubMed] [Google Scholar]
- 39.Costantini, M., Colin, J. & Decharme, B. Projected Climate-Driven Changes of Water Table Depth in the World’s Major Groundwater Basins. Earth’s Future 11, e2022EF003068 (2023).
- 40.Eldin Elzain, H., Abdalla, O., Al-Maktoumi, A., Kacimov, A. & Eltayeb, M. A novel approach to forecast water table rise in arid regions using stacked ensemble machine learning and deep artificial intelligence models. J. Hydrol.640, 131668. 10.1016/j.jhydrol.2024.131668 (2024). [Google Scholar]
- 41.Elzain, H. E. et al. Comparative study of machine learning models for evaluating groundwater vulnerability to nitrate contamination. Ecotoxicol. Environ. Saf.229, 113061. 10.1016/j.ecoenv.2021.113061 (2022). [DOI] [PubMed] [Google Scholar]
- 42.Park, S. et al. Elsevier,. in Groundwater Contamination in Coastal Aquifers (eds Venkatramanan Senapathi, Selvam Sekar, Prasanna Mohan Viswanathan, & Chidambaram Sabarathinam) 55–70 (2022).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The dataset generated and/or analyzed during the current study is available from the corresponding author upon reasonable request. The Python codes used for model development and evaluation are publicly available at the following GitHub repository: https://github.com/mohammadj74/Groundwater-ML-Najafabad.

















