Abstract
This research employs conventional and optimized extreme gradient boosting (XGBoost) models to predict the end-bearing capacity (
) of rock-socketed shafts. The arithmetic optimization (AOA), brainstorm optimization (BOA), and whale optimization (WOA) algorithms were used to optimize the XGBoost model. To conduct this research, a database of the
of 151 rock-socketed shafts was compiled from the literature. The database (mentioned by O_Data) was preprocessed, and the
of the 136 rock-socketed shafts was obtained. The Gaussian-noise technique was employed to create a synthetic database based on the
of 136 rock-socketed shafts. A database of the
of 500 rock-socketed shafts was generated and preprocessed. The
of 460 rock-socketed shafts (136 original + 324 synthetic after preprocessing datasets) developed a second database (mentioned by OS_Data). The XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models were trained and tested using both databases. The performance analysis revealed that the XGBoost model estimated the
with a root mean square error (RMSE) of 0.9205, mean absolute error of (MAE) of 0.7024, and a performance (R) of 0.9295 using the O_Data. Later, the performance of the XGBoost_AOA model was enhanced to 0.9894 using the OS_Data. It was also observed that OS_Data improved generalizability and reduced overfitting in the XGBoost_AOA model. Moreover, the multicollinearity analysis revealed that the rock mass rating (RMR) and geological strength index (GSI) exhibit problematic multicollinearity. In addition, the sensitivity analysis demonstrated that the RMR and GSI features have contributions of 20.301% and 20.369%, respectively, in estimating
. For the first time, this research mapped a relationship between feature multicollinearity and sensitivity to analyze the overfitting of the soft computing models. Moreover, SHapley Additive exPlanations (SHAP) analysis identified compressive strength and rock mass rating as dominant predictors (0.65–1.36), while the geological strength index showed minimal influence (< 0.10). Finally, this research provides a Graphical User Interface application to help the geotechnical engineers and designers estimate the
.
Keywords: Brainstorm optimization algorithm, Data augmentation, End-bearing capacity, Extreme gradient boosting, Rock-socketed shaft
Subject terms: Engineering, Mathematics and computing, Solid Earth sciences
Introduction
A rock-socketed shaft is a type of deep foundation used when the upper soil layers are too weak to support structural loads and stronger rock is located at greater depths. In practice, rock-socketed shafts are commonly referred to as drilled piers in bridge construction and bored piles in high-rise building projects1. In modern design practice, the end-bearing resistance of rock-socketed shafts is often overlooked due to practical challenges such as slurry settlement at the base, limited access for bottom inspection, concerns about potential voids beneath the socket, and, most notably, the difficulty in accurately evaluating end-bearing capacity2. This conservative approach frequently results in unnecessarily long rock sockets, leading to overdesign and increased construction costs. Moreover, the end-bearing capacity of such a foundation depends on the procedure adopted to clean the borehole base before concreting, groundwater conditions, structural parameters of the socket base (e.g., roughness of the socket surface, size, and shape), and rock properties (e.g., deformation modulus, rock mass rating, and unconfined compressive strength). For the economic and safe design of a rock-socketed shaft, proper evaluation and control of these parameters are crucial to ensure optimal shaft performance. For that purpose, several empirical methods3–9 were introduced using the field and laboratory determined unconfined compressive strength of rock (
) to compute the end-bearing capacity of the shaft (
).
![]() |
1 |
Where
and
are the constants10. The end-bearing capacity of the shaft is significantly affected by intact rock and discontinuities in the rock mass10–12. In this regard, Kulkarni and Dewaikar13 and Zhang14 derived an empirical relation between
and rock mass strength (
). However, these empirical relationships have limitations, as they do not account for the dimensional parameters of the shaft, such as its size, shape, and embedded depth. Conversely, the Hoek-Brown (HB) and Mohr-Coulomb (MC) failure criteria were used to model the rock mass to analyze the
of the shaft theoretically. First, the characteristic-line method based on the HB criterion was used to analyze the
of a pile embedded in rock mass15, and it overestimated
because it considers the rock mass as a rigid plastic material. This approach was later extended by incorporating the modified MC criterion16 and HB criterion17. Furthermore, Gharsallaoui et al.18, Yasufuku and Hyde19, and Vesic20 introduced different empirical relations to estimate the
of rock-socketed shaft. Overall, these empirical methods revealed significant shortcomings due to the limited design parameters and their associated assumptions.
To overcome these shortcomings, artificial intelligence (AI) techniques have been applied in foundation engineering due to their significant capabilities for estimating bearing capacity using multiple design parameters and factors. Wang et al.21 estimated the uplift-bearing behavior of micropiles socketed in soft rock for slope topography. Seo et al.22 stated that k-nearest neighbour (kNN), extreme gradient boosting (XGBoost), and deep neural network (DNN) can estimate the bearing capacity of prebored and precast piles. It was noted that these soft computing approaches estimated
of piles at the end of initial driving (EOID; R = 0.9798) and restrick (R = 0.9545) phases with an accuracy of over 95% using the DNN model. Kim et al.23 predicted the base resistance of drilled shafts socketed in a rock mass using a coupled parametric finite element limit analysis and a modified HB failure criterion. The study concluded that the base resistance factor for undisturbed weightless rock mass (
) increased up to L/B = 10; L and B are the pile length and diameter. Also, the
significantly increased with the HB strength parameters, i.e., geological strength index (GSI) and material constant for rock (
). Conversely, Gutiérrez-Ch et al.24 analyzed the distribution of axial load and shaft resistance mobilized with depth, stresses at the pile-rock interface (PRI) as a function of socket head settlement, inter-particle force distributions, load-settlement response, and the failure mechanism using the distinct element method (DEM). Alzouba et al.25 employed a deep forest (DF) model to estimate the toe-bearing strength of rock-socketed piles using 138 data points. It was reported that the DF model outperformed the random forest (RF), AdaBoost (Ada), gradient boosting (GB), extra trees (ET), XGBoost, and bagging regressor (BR) models. The Local Interpretable Model-Agnostic Explanations (LIME) demonstrated that the GSI is a significant feature in predicting the
of the pile. Al-Atroush26 employed a deep-learning-physics-informed neural network (DL_PINN) using Myerhof’s bearing capacity equation and compared it with an artificial neural network (ANN). Zhang and Xue27 compared the Bayesian optimized-XGBoost (BO_XGBoost), RF, group method of data handling (GMDH), backpropagation neural network (BPNN), and gene expression programming (GEP) models in predicting
of the pile. The models were created using
, GSI, pile length in soil (PLS), pile length in rock (PLR), and pile diameter (B) of 138 piles. The study also demonstrated that the feature selection significantly affected the performance (R) of BO_XGBoost (0.9788 using
, GSI, PLR, B), RF (0.9476 using
, GSI, PLS, PLR, B), GMDH (0.7874 using
, GSI, PLS, PLR, B), GEP (0.8479 using
, GSI, PLS, PLR, B), and BPNN (0.8456 using
, GSI, PLS, PLR, B) models. You and Mao28 utilized the standard penetration test (SPT) results, e.g., the ratio of the total length (L) to diameter (B), the ratio of length in the soil layer (PLS) to the length in the rock layer (PLR),
, and number of blows (SPTN), and rock layers (Hr), of 172 events, to estimate the
of the pile. The study concluded that the optimization techniques, specifically the crystal structure algorithm (CSA) and dandelion optimization algorithm (DOA), improved the performance of the support vector regressor (SVR) model. The DOA_SVR model (R = 0.9940) outperformed the CSA_SVR (R = 0.9914) and conventional SVR (R = 0.9706) models. Chen29 employed an Aquila optimizer (AO_SVR) and a Henry gas solubility optimizer (HGSO_SVR)-based SVR model. Furthermore, Yang30 employed least squares SVR (LSSVR) and radial basis function (RBF) models. These models were optimized using the African vultures optimization algorithm (AVOA) and the Pelican optimization algorithm (POA). The investigator employed these models (LSSVR, AVOA_LSSVR, POA_LSSVR, RBF, AOVA_RBF, and POA_RBF) using the same database considered by You and Mao28. The comparison showed that the POA_LSSVR (R = 0.9955) model outperformed LSSVR (R = 0.9772), AVOA_LSSVR (R = 0.9899), RBF (R = 0.9700), AOVA_RBF (R = 0.9803), POA_RBF (R = 0.9874), and DOA_SVR (utilized by You and Mao (2024)) models. Nawaz et al.31 utilized the
with GSI,
, B, PLS, and PLR features to employ the GEP model. Murali et al.32 also demonstrated that the interaction between the pile, rock, and soft interface materials significantly affects the load-carrying capacity of rock-socketed piles. Mostafa33 estimated the
of pile socketed in limestone strata34 analyzed the behaviour of the pile in this stratum embedded with a karst cavity beneath the pile tip) using the ANN model. Liu et al.35 and Liang et al.36 predicted the
of large-diameter rock-socketed piles considering internal micro-cracks. Zhao et al.37 applied a vertical load to eight excavated socketed piles and observed that the load-displacement curve varied slowly, i.e., less than 11 mm. However, the load-transfer mechanism reveals that the applied load on the pile is transferred to the strata (the strata are improved by cement soil reinforcing38 through end-bearing and socket friction, which is initially supported by the side shear39. The lateral frictional resistance of the pile and shear displacement between rock and pile are closed40. Millán et al.41 neglected shaft resistance and analyzed the failure models of different piles configured using the discontinuity layout optimization (DLO) method.
Chen et al.42 established a 3D semi-analytical solution for the rock-socketed shaft bearing capacity using the 3D HB criterion and Runge-Kutta-based iterative approach. The study revealed that the end-bearing capacity factor (
) increases with
, GSI, PLS, and PLR. It was also mentioned that the
decreases with increasing
. Furthermore, Chen and Zhang1 employed support vector machine (SVM), decision tree (DT), RF, ensemble learning (EL), and Gaussian process regression (GPR) using
, GSI, PLS,
, and PLR parameters. It was observed that the EL model outperformed the SVM, DT, RF, and GPR models, with a root mean square error (RMSE) of 1.17 and an index of agreement (IOA) of 0.955, which is close to the ideal values. Still, the impact of the degree of socket roughness on resistance (as analyzed by a centrifuge test performed at 50 g) and load-settlement behavior of the shaft43 was not analyzed. Later, Gutiérrez-Ch et al.44 estimated the side shear resistance considering socket roughness. Additionally, the socket roughness was accounted for in design rock-socketed piles using the DEM45. Finally, the researchers proposed a factor,
; where
is the socket head settlement, to compute the average side shear resistance of rock sockets under socket head settlement of 1% of the socket diameter. Jeong et al.46 derived an empirical formula for predicting pile tip bearing capacity using the field results of 211 prebored and precast steel piles. Moreover, Wang et al.47 performed field tests and derived a hyperbolic model to estimate the uplift bearing capacity of rock-socketed belled piles. Similarly, Barrett and Prendergast48 established an empirical relationship between
and shaft resistance using 42 steel H-pile load test results. Xing et al.49 analyzed the load-bearing characteristics of large-diameter (socket diameter is inversely proportional to peak side resistance50 rock-socketed bored piles using the self-balanced method. Mishra et al.51 estimated the end-bearing capacity of a pile using Vander Veen’s method, De Beer’s method, Chin’s method, Shen’s method, and Decourt’s method. It was reported that Vander Veen’s method gave the less promising results. It is noted that limited studies are available on the estimation of end-bearing capacity of rock-socketed piles using artificial intelligence techniques, such as DNN, DF, XGBoost, SVR, LSSVR, GEP, ANN, and EL, as summarized in Table 1.
Table 1.
Summary of the available models in the literature.
| S. No. | References | Model: Test R | Features: Data | Model Limitations |
|---|---|---|---|---|
| 1 | Seo et al.22 | DNN; 0.9798 | B, L, DH, R, S, ET, 217 |
• Requires large datasets for effective training to avoid overfitting. • Prone to overfitting, especially with small or noisy data. |
| 2 | Alzouba et al.25 | DF; 0.9127 |
, GSI, PLS, PLR, B; 138 |
• Limited library support. • Require effective feature preprocessing and scaling. • Require effective feature preprocessing and scaling. |
| 3 | Zhang and Xue27 | BO_XGBoost; 0.9813 |
, GSI, PLS, PLR, B; 138 |
• Overfits if not properly regularized or tuned. • Less efficient with extremely high-dimensional sparse data. • BO has a scalability issue and is sensitive to kernel selection. |
| 4 | You and Mao28 | DOA_SVR; 0.9940 | L/B, PLS/PLR, Hr, SPTN, ; 172 |
• Not efficient for very large datasets due to a quadratic optimization problem. • Highly sensitive to the choice of kernel and its parameters. • Computationally expensive with non-linear kernels. • DOA can be trapped in local optima. |
| 5 | Yang30 | POA_LSSVR; 0.9955 | L/B, PLS/PLR, Hr, SPTN, ; 172 |
• Less robust to noise due to the least squares loss function. • Requires careful tuning of regularization and kernel parameters. • Performance degrades with very large datasets. • POA has premature convergence in complex search spaces |
| 6 | Nawaz et al.31 | GEP; 0.8600 |
, GSI, , B, PLS, PLR; 151 |
• Evolutionary process requires a high number of generations and evaluations. • Slower when applied to large-scale datasets or many features. |
| 7 | Mostafa33 | ANN; 0.9823 | L/B, W, DH, ; 82 |
• Susceptible to local minima, overfitting, and vanishing gradients. • Requires a significant amount of labeled data. • Sensitive to architecture, learning rate, activation functions, etc. |
| 8 | Chen and Zhang1 | EL; 0.9418 |
, GSI, , B, PLS, PLR; 151 |
• Aggregation of multiple models increases system complexity and training time. • Overfits with small datasets if weak learners are selected. |
Note: B is the diameter of pile, L is the length of pile, DH is the drop height (EOID, restrike), R is the ram (EOID, restrike), S is the set (EOID, restrike), ET is the elapsed time, LPR is the pile length in rock, PLS is the pile length in soil, GSI is the geological strength index,
is the material constant, and W is the weight of the hammer.
Gap Identification and Novelty of the Present Work – The published research revealed that the soft computing approaches are more reliable than empirical approaches in estimating the
of rock-scoketed shaft/pile. These soft computing approaches were employed using
, GSI,
, B, PLS, PLR, L/B, W, DH, and L features of different datasets. It is well-defined that the soft computing approaches are black-box and their prediction capabilities are significantly affected by (a) feature selection and (b) the quantity of the database. Moreover, the literature showed that no researcher employed and analyzed the soft computing models using the B, Hr, Hs, GSI, RMR,
,
, and
features. No researcher has attempted the data augmentation technique to analyze the effect of a synthetic database on soft computing models using such features. A few investigators, e.g., Zhang and Xue27, You and Mao28, and Yang30, have employed the optimized models to estimate the
of rock-socketed shaft/pile. Furthermore, no investigation has reported the impact of feature multicollinearity on the conventional and optimized soft computing models. Considering the outlines mapped from the literature, the novelty of this investigation is as follows:
This study analyses the capabilities of conventional and optimized (using the Arithmetic Optimization Algorithm-AOA, Brainstorm Optimization Algorithm-BOA, and Whale Optimization Algorithm-WOA) extreme gradient boosting (XGBoost) models to predict the
of rock-socketed shaft using the B, Hr, Hs, GSI, RMR,
,
, and
features.This research uses the Gaussian-noise augmentation technique to create a synthetic database to analyze the impact of the augmentation technique on performance and overfitting of the conventional and optimized XGBoost models in estimating
of rock-socketed shaft for the first time.This investigation demonstrates the impact of feature multicollinearity on the overfitting of conventional and optimized XGBoost models, and draws a relationship between feature sensitivity and multicollinearity for the first time.
Research Significance – This work compares the conventional and optimized XGBoost models to introduce an optimal performance model to predict the
of rock-socketed shaft. This research develops a graphical user interface (GUI) application to estimate the
of rock-socketed shaft using the optimal performance model. This application will help geotechnical engineers and designers estimate the
of rock-socketed shaft without performing cumbersome and time-consuming tests in both the laboratory and field. In addition, this investigation will help data scientists who deal with data science and artificial intelligence (AI) applications to understand the effect of feature multicollinearity on the overfitting of soft computing models.
Research methodology
The present study estimates the end-bearing capacity of rock-socketed shaft using the Arithmetic Optimization Algorithm (AOA), Brainstorm Optimization Algorithm (BOA), and Whale Optimization Algorithm (WOA)-based XGBoost models. To conduct this investigation, the published database by Chen and Zhang1 has been utilized. The database consists of
, shaft diameter (B in meters), length of shaft within the rock layer (Hr in meters), length of shaft within the soil layer (Hs in meters), geological strength index (GSI = RMR − 5), rock mass rating (RMR), unconfined compressive strength (
), rock material constant (
) of 151 test shafts. The ultimate bearing capacity (
) has been calculated by multiplying
and
. Thus, a combination of eight features, i.e., B, Hr, Hs, GSI, RMR,
,
, and
has been created to estimate the
of rock-socketed shafts. The complete database has been preprocessed using z-score analysis and the detection of missing values. Finally, a database (mentioned by the O_Data) of 136 test shafts has been obtained from the preprocessing. To analyze the reliability of the preprocessed database, the analysis of variance-ANOVA (performed to analyze the research hypothesis), distance correlation coefficient method (mapped relationship between features and label), variance inflation factor-VIF (measured the multicollinearity level of each feature), and cosine amplitude (to calculate the sensitivity of each feature) have been performed. After analyzing the reliability of the O_Data, training (85% of 136) and testing (15% of 136) databases were created to employ the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models.
Furthermore, the Gaussian augmentation technique has been implemented to generate synthetic end-bearing capacities for 500 shafts. The synthetic database has been preprocessed, and the test results for 324 shafts have been obtained. Finally, a database (mentioned by OS_Data) of the end-bearing capacity of 460 (136 original + 324 synthetic) rock-socketed shafts has been created. Again, the training (85% of 460) and testing (15% of 460) databases have been created to develop the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models. The root mean square error (RMSE), mean absolute error (MAE), performance (R), variance accounted for (VAF), performance index (PI), bias factor (BF), root mean square error to observation’s standard deviation ratio (RSR), and Legate and McCabe’s index (LMI) metrics have been measured and analyzed the prediction capabilities of each model using the O_Data and OS_Data. The visual interpretation of the prediction capabilities of each model has been carried out using Taylor plots, Anderson-Darling (AD) tests, and Akaike information criteria (AIC) for both databases. The generalizability of each model has been analyzed for both databases.
Additionally, the training and testing performances of each model have been compared for both databases to assess the impact of the synthetic database. Moreover, the reliability (using a20, scatter, and agreement indices) and overfitting of each model have been analyzed using both O_Data and OS_Data. For the first time, a relationship between sensitivity and multicollinearity of features has been drawn to analyze the overfitting of each model. To summarize, a comparison between empirical and optimal performance models has been discussed. Figure 1 illustrates the methodology employed in the present research.
Fig. 1.
Illustration of the flow of research methodology.
Data analysis and augmentation
Data analysis
This study utilizes a database of end-bearing capacity for 151 rock-socketed shafts, published by Chen and Zhang1. The database consists of the
, B, Hr, Hs GSI, RMR,
,
parameters of 151 rock-socketed shafts. In addition, the ultimate bearing capacity (
) has been calculated by multiplying
and
. Thus, the database has been prepared to predict the
using a combination of eight features, i.e., B52, Hr53, Hs54, GSI55, RMR56,
57,
58, and
59. The outliers (using the z-score method) and missing values have been removed from the database. Finally, a database of the end-bearing capacity of 136 rock-socketed shafts has been obtained. Table 2 presents the descriptive statistics of the preprocessed database. In addition, the behaviour of the database has been analyzed by drawing frequency distribution plots, as presented in Fig. 2. It can be noted that each parameter has an excellent bell-shaped normal distribution curve. Therefore, the database can be utilized to conduct this investigation.
Table 2.
Statistics of end-bearing capacity of 136 rock-socketed shafts.
| Parameters | Category | Unit | Mean | Min | Max | SDev | Skew | Kurt | IQR1 | IQR3 | IQR2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
Input | - | 11.96 | 4.00 | 29.00 | 7.08 | 0.85 | -0.49 | 6.00 | 15.00 | 9.00 |
|
Input | MPa | 9.22 | 0.62 | 70.50 | 13.43 | 2.47 | 6.35 | 1.42 | 11.27 | 9.86 |
| RMR | Input | - | 59.39 | 15.00 | 100.00 | 22.25 | -0.21 | -0.79 | 40.00 | 70.00 | 30.00 |
| GSI | Input | - | 54.71 | 10.00 | 95.00 | 22.48 | -0.22 | -0.84 | 35.00 | 71.50 | 36.50 |
| Hs | Input | m | 4.67 | 0.00 | 25.00 | 6.11 | 1.27 | 0.82 | 0.00 | 8.80 | 8.80 |
| Hr | Input | m | 4.30 | 0.00 | 17.00 | 4.26 | 1.39 | 1.43 | 1.18 | 7.10 | 5.93 |
| B | Input | m | 0.84 | 0.02 | 1.94 | 0.44 | -0.01 | -0.41 | 0.46 | 1.20 | 0.74 |
|
Input | MPa | 14.44 | 0.50 | 367.31 | 32.63 | 9.52 | 99.31 | 5.43 | 14.73 | 9.29 |
|
Output | 3.08 | 0.15 | 11.80 | 2.64 | 1.21 | 1.01 | 0.88 | 4.62 | 3.74 |
Note: SDev is the standard deviation, Skew is the skewness, Kurt is the Kurtosis, IQR1 is the first quartile (= 25%), IQR2 is the second quartile (= 50%), and IQR3 is the third quartile (= 75%).
Fig. 2.
Illustration of the frequency distribution of (a)
, (b)
, (c) RMR, (d) GSI, (e) Hs, (f) Hr, (g) B, (h)
, and (i)
.
After obtaining a good representation of features and labels, the relationship has been mapped using the distance correlation method. This correlation is calculated by dividing their distance covariance by the product of their distance standard deviations60. The mathematical formulation of distance correlation is as follows:
![]() |
2 |
Where Cov is the standard Pearson correlation, Var is the variance. The correlation value varies from 0 to 1. The distance correlation (DC) presents no (DC = 0), weak (0 < DC ≤ 0.2), good (0.2 < DC ≤ 0.4), moderate (0.4 < DC ≤ 0.6), strong (0.6 < DC ≤ 0.8), and very strong (0.8 < DC ≤ 1.0) relationships between a pair of data points61,62. Figure 3 demonstrates that (a)
has good relation with B (= 0.273) and Hr (= 0.256), (b)
has a moderate relationship with RMR (= 0.541), GSI (= 0.546), and Hs (= 0.428), (c)
has good relationship with B (= 0.220), Hs (= 0.252), GSI (= 0.316), RMR (= 0.319), (d)
has a moderate relationship
(= 0.592), (e) RMR has a very strong relationship with GSI (= 0.990), because GSI = (RMR – 5), (f) RMR has a good relationship with
(= 0.216), Hr (= 0.225), and Hs (= 0.398), (g) RMR has a moderate relationship with B (= 0.450), (h) GSI has a good relationship with
(= 0.214) and Hr (= 0.232), (i) GSI has a moderate relationship with B (= 0.448) and Hs (= 0.410), (j) Hs has a moderate relationship with B (= 0.505), (k) Hr has a good relationship with
(= 0.208) and a moderate relationship with B (= 0.407), (l) B has a good relationship with
(= 0.275). It has been observed that each feature, i.e.,
(= 0.430; moderate),
(= 0.538; moderate), RMR (= 0.609; strong), GSI (= 0.612; strong), Hs (= 0.267; good), Hr (= 0.201; good), B (= 0.431; moderate), and
(= 0.187; weak) has a different relationship with
Based on the relationship, the following research hypothesis statements have been drawn.
Fig. 3.
Depiction of distance correlation matrix.
GSI and RMR significantly influence the estimation of
and incorporating both leads to a more accurate estimation of shaft performance compared to using strength parameters alone.The combined effect of shaft geometry and rock properties significantly predicts variations in the
for rock-socketed shafts, indicating a strong interaction between structural and material properties.
The ANOVA test has been performed to analyze the research hypothesis. The results of the ANOVA test have been summarized in Table 3. Features such as
,
, RMR, GSI, B, and
demonstrate F-statistic values greater than their corresponding F-critical values, with p-values less than 0.05, indicating that these features have a statistically significant effect on
at the 95% confidence level63,64. Notably, the variable
shows an infinite F-statistic due to zero within-group variation, further emphasizing its dominant role. On the other hand, variables Hs and Hr exhibit F-statistics below their F-critical values and high p-values (0.90 and 0.53, respectively), suggesting that they do not significantly contribute to the variation in
. Hence, the ANOVA results validate the initial research hypothesis that most geomechanical and geometrical parameters, except Hs and Hr, significantly affect the estimation of
in rock-socketed shafts.
Table 3.
Summary of ANOVA test results.
| Feature | SS between | SS within | df between | df within | MS between | MS within | F statistic | p value | F critical |
|---|---|---|---|---|---|---|---|---|---|
|
326.91 | 615.02 | 13.00 | 122.00 | 25.15 | 5.04 | 4.99 | 0.00 | 1.80 |
|
849.26 | 92.67 | 110.00 | 25.00 | 7.72 | 3.71 | 2.08 | 0.02 | 1.77 |
| RMR | 542.78 | 399.15 | 36.00 | 99.00 | 15.08 | 4.03 | 3.74 | 0.00 | 1.54 |
| GSI | 543.73 | 398.20 | 36.00 | 99.00 | 15.10 | 4.02 | 3.76 | 0.00 | 1.54 |
| Hs | 286.31 | 655.62 | 51.00 | 84.00 | 5.61 | 7.80 | 0.72 | 0.90 | 1.50 |
| Hr | 498.27 | 443.66 | 72.00 | 63.00 | 6.92 | 7.04 | 0.98 | 0.53 | 1.50 |
| B | 566.88 | 375.05 | 44.00 | 91.00 | 12.88 | 4.12 | 3.13 | 0.00 | 1.51 |
|
941.93 | 0.00 | 134.00 | 1.00 | 7.03 | 0.00 | inf | 0.00 | 253.36 |
Note: SS is the sum of squares, df is the degree of freedom, and MS is the mean square.
The second statement of the research hypothesis shows that the estimation of
is significantly affected by the shaft geometry, i.e., Hs, Hr, and B. Therefore, it has been decided to include shaft geometry parameters. Still, the effect of feature multicollinearity cannot be avoided in the prediction of
, especially for shaft geometry. To analyze the feature multicollinearity, the variance inflation factor (VIF) has been utilized in this investigation. Khatti and Grover65 introduced five different levels of feature multicollinearity, i.e., no (VIF = 0), weak (0 < VIF ≤ 2.5), considerable (2.5 < VIF ≤ 5.0), moderate (5.0 < VIF ≤ 10.0), and problematic (VIF > 10). The VIF (
) is determined using the determination coefficient (
) obtained in regression analysis as depicted in Fig. 4. It has been noted that the
(= 1.88),
( = 1.75), Hs (= 1.79), Hr (= 1.40), B (= 2.12), and
(= 1.68) features have weak multicollinearity in predicting
. Conversely, the RMR (= 67.75) and GSI (= 68.89) features exhibit problematic multicollinearity. Hence, the shaft geometry parameters have been selected as features to predict the
.
Fig. 4.
Illustration of the computation of multicollinearity for each feature.
The correlation coefficient and multicollinearity analysis have demonstrated a significant role in estimating the
of rock-socketed shaft. To understand the contribution of each feature in the
prediction, the cosine amplitude sensitivity analysis has been performed. The mathematical expression of cosine amplitude is as follows66,67:
![]() |
3 |
Where
and
are the input(s) and target values. The cosine amplitude analysis demonstrates the sensitivity of features as weak (0 < CA ≤ 0.25), moderate (0.25 < CA ≤ 0.50), high (0.5 < CA ≤ 0.75), and very high (0.75 < CA ≤ 1.00). Figure 5 demonstrates that the
,
, RMR, GSI, B, and
features have sensitivity of 0.528, 0.219, 0.846, 0.849, 0.333, 0.486, 0.549, and 0.359, respectively. It has also been observed that the
,
, RMR, GSI, B, and
features have a contribution of 12.674%, 5.257%, 20.301%, 20.369%, 7.99%, 11.646%, 13.161%, 8.601%, respectively. Finally, the
,
, RMR, GSI, B, and
features have been selected for predicting the end-bearing capacity of the rock-socketed shaft.
Fig. 5.
Illustration of the sensitivity and contribution of each feature in the
prediction.
Data augmentation
This investigation employs the Gaussian technique to develop synthetic data using the original database. The Gaussian technique adds small and random noise to the features, generating synthetic data points that are close to the original data points but slightly perturbed. Lopes et al.68, Qian et al.69, Santoni et al.70, and Wang et al.71 observed that the synthetic data enhanced the performance and accuracy of the soft computing models. Abbahaddou et al.72 enhanced the generalizability of graph neural network models by utilizing unseen or out-of-distribution (OOD) datasets. In addition, several researchers have utilized the data augmentation technique to assess slope stability accurately73–75, blast-induced outbreaks76, and geo-mechanical properties77. Moreover, Zhao and Teng78, Demir and Şahin79, Li et al.80, Soomro et al.81, and Wen et al.82 utilized the Synthetic Minority Oversampling Technique (SMOTE) to enhance the capabilities of machine learning models. Mostly, SMOTE is employed to solve classification problems. The SMOTE technique creates a database using an interpretation process and k-neighbors83. The major advantage of the Gaussian technique over SMOTE is that it can be used for both regression and classification problems84. The Gaussian technique is based on perturbation, which requires the standard deviation of the noise. Therefore, the Gaussian noise technique has been adopted in this investigation to create a synthetic database using an original database of the end-bearing capacity of 136 rock-socketed shafts. The technique has been applied to each feature (d), i.e.,
68:
![]() |
4 |
In case of
is a continuous numeric feature, then a perturbed version
:
![]() |
5 |
Where
is the Gaussian noise,
is the standard deviation of the noise for feature j,
is the normal (Gaussian) probability distribution, and
is generally selected relative to the scale or range of the feature. Therefore, the augmented database
is given by:
![]() |
6 |
Using the Gaussian noise technique on 136 data points, a database of the end-bearing capacity of 500 rock-socketed shafts has been created. Later, the synthetic database was preprocessed, and a database of the end-bearing capacity of 324 rock-socketed shafts was obtained.
Data preparation
This investigation analyzes the capabilities of hybrid machine learning approaches in predicting
. For the comparison of hybrid machine learning approaches, two databases, namely the original (O_Data) and original synthetic (OS_Data) datasets, have been used. The min-max function (
; where
is the normalized value,
is the original value,
is the maximum value in the column array, and
is the minimum value in the column array) has normalized (between − 1 and + 1) each column in the database. Figure 6 presents a comparison of the characteristics of the normalized variables before and after preprocessing for the O_Data and OS_Data. It can be noted that the characteristics of each variable of the O_Data have been improved after preprocessing (refer to Fig. 6a and b). Conversely, it cannot be neglected that the characteristic of each variable of the O_Data has been significantly improved after adding the synthetic database (refer to Fig. 6c and d). Figure 6(d) shows that most variables exhibit a symmetric distribution around their median (Q2), though some skewness is observed. The medians for each variable are located near the center of the boxes, indicating a balanced distribution. Overall, it is visualized that all variables are well-normalized and fall within ± 2 standard deviations, with a few outliers beyond this range.
Fig. 6.
Illustration of comparison of characteristics of normalized variables for (a) O_Data before, (b) O_Data after, (c) OS_Data before, and (d) OS_Data after preprocessing.
Machine learning model and optimization
Extreme gradient boosting (XGBoost)
The gradient boosting framework serves as the foundation for the potent and popular machine learning algorithm Extreme Gradient Boosting (XGBoost), which is built for speed, scalability, and excellent predictive performance85–89. By using gradient descent optimization to minimize a given loss function, it produces an ensemble of decision trees one after the other, with each new tree aiming to fix the mistakes of the ones that came before it. Compared to conventional gradient boosting approaches, XGBoost is more resilient since it uses regularization techniques including L1 (Lasso) and L2 (Ridge) penalties to lessen overfitting. It also allows distributed and parallel computing, effectively manages missing values, and can analyze high-dimensional, large-scale datasets. Regression, classification, and ranking tasks have grown to favor XGBoost because of its accuracy, effectiveness, and adaptability.
Optimization techniques
This investigation uses Arithmetic Optimization (AOA), Brainstorm Optimization (BOA), and Whale Optimization (WOA) Algorithms to optimize the XGBoost model and to create models XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA. These algorithms ensure superior exploration and exploitation by efficiently handling the high-dimensional, non-convex search space of hyperparameter tuning, thereby avoiding local optima. These algorithms are suitable for black-box models, such as XGBoost, since they operate independently of gradient information. AOA and WOA strike a compromise between simplicity and computational efficiency, whereas BOA promotes collaboration and variety throughout the search process. The prediction accuracy and overall model performance of XGBoost are greatly enhanced by these methods, which adjust hyperparameters, including firing patterns, total explosive quantity, and spacing-to-burden ratio90,91. Their flexibility, robustness, and ability to adapt to dynamic problems make them ideal for machine learning optimization tasks. In addition, the published research on different applications of the XGBoost model in civil engineering by Yuan et al.92, Lu et al.93, Kazemi et al.94, Geng et al.95, Shaik et al.96, Sun et al.97, Nabavi et al.98, Gu et al.99, Chandrahas et al.100, Qiu et al.101, Zhang et al.102 revealed the robustness of the XGBoost approach. Therefore, it has been decided to employ XGBoost to predict the
. In the present research, each model has been trained and tested using
to predict the
of rock-socketed shaft. The mathematical formulation of each optimization algorithm for the features is as follows:
Arithmetic optimization algorithm (AOA)
The AOA is a nature-inspired metaheuristic optimization method that balances exploration and exploitation during the search process by simulating the distribution behavior of arithmetic operators, such as addition, subtraction, multiplication, and division103–105. AOA allows the algorithm to efficiently scan the whole global search space while progressively concentrating on attractive locations by updating potential solutions using mathematical operations that dynamically manage intensification and diversification. Preventing premature convergence and improving solution quality are two benefits of AOA’s adaptive mechanism, which modifies the impact of arithmetic operators based on iteration progress. Numerous engineering design, machine learning, feature selection, and real-world optimization challenges have benefited from the effective use of AOA due to its ease of use, adaptability, and competitive performance. Xiao and Du106, Wu et al.107, Esmaeili-Falak and Benemaran108, and Li and Mei109 reported that the AOA algorithm is highly capable of assessing the soil parameters with the least residuals.
Brainstorm optimization algorithm (BOA)
The human brainstorming process, in which people produce, exchange, and hone ideas to address challenging issues, served as the model for the Brainstorm Optimization Algorithm (BSO), a population-based metaheuristic110–113. In BSO, potential solutions are viewed as concepts within a population that are categorized into clusters to promote variety and information sharing. Global exploration, which recombines ideas from many clusters, and local exploitation, which refines ideas inside the same cluster, are coupled to create new solutions. By striking a balance between exploration and exploitation, BSO is able to search the solution space effectively and prevent premature convergence. Its idea-evolution method and adaptive clustering technique make it especially useful for resolving high-dimensional, multimodal, and nonlinear optimization problems. BSO has been used for engineering optimization, machine learning parameter adjustment, and other real-world decision-making tasks because of its adaptability and resilience114,115.
Whale optimization algorithm (WOA)
The Whale Optimization Algorithm (WOA) is a nature-inspired optimization technique that uses a bubble-net feeding approach to simulate how humpback whales hunt. The procedure alternates between two primary actions in this algorithm: hunting prey (exploration) and attacking it (exploitation)116–119. Each whale in the algorithm represents a potential solution. Whales travel in a spiral fashion to mimic their bubble-net hunting during exploiting, but they explore more haphazardly during exploration to cover a larger range of potential targets. The algorithm is better able to identify the optimal result and avoid becoming trapped in subpar solutions because to this combination of local refining and global search. Due to its simplicity, effectiveness, and adaptability, Dolatshahi and Molladavoodi120, Li et al.121, Jiadong et al.122, Yao et al.123, Xue et al.124, Su et al.125, Ni et al.126, and Rabbani et al.127 implemented the WOA algorithm in predicting the rock properties.
Hyperparameter tunning
The hyperparameter tuning of XGBoost models has been conducted separately for the O_Data and OS_Data using the AOA, BOA, and WOA optimization algorithms. For the O_Data, the base XGBoost model used default parameters with moderate depth and learning rate. In contrast, the XGBoost models tuned with AOA, BOA, and WOA algorithms have demonstrated significant variations in their parameters. AOA and BOA both have reduced tree depth and adjusted learning rates and sampling ratios. In contrast, WOA has used the lowest learning rate and the shallowest depth, thereby emphasizing finer granularity in training. When using OS_Data, the tuning resulted in deeper trees (especially in AOA), increased subsampling ratios, and adjusted gamma and min_child_weight values to enhance generalization. BOA has selected a higher learning rate with a simpler tree structure, while WOA has increased tree depth and gamma for better complexity control. Overall, the addition of synthetic data prompted the models to favor more complex structures and higher subsampling to capture richer data patterns. These tuned models reflect a balance between bias, variance, and computational efficiency tailored to each dataset. Table 4 presents the hyperparameter configurations of the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models, using the O_Data and OS_Data datasets. Table 4 demonstrates that the optimal hyperparameter configurations vary between O_Data and OS_Data, reflecting differences in their underlying data distributions and structural complexity. Hyperparameters governing model capacity (e.g., max_depth and min_child_weight), learning dynamics (learning_rate), and regularization (gamma, subsample, and colsample_bytree) are adaptively adjusted by the optimization algorithms to achieve an appropriate bias–variance trade-off for each dataset. For O_Data, the optimized models generally favor shallower trees and stronger regularization, resulting in more conservative learning behavior and improved generalization. In contrast, OS_Data yields configurations with increased tree depth and higher sampling ratios, indicating the need for greater model expressiveness to capture the enhanced data variability. Importantly, the optimization process is reproducible, as all experiments were conducted using fixed random seeds (random_state = 42), identical population sizes, and consistent iteration counts. Although minor variations are inherent to stochastic metaheuristic searches, the resulting hyperparameter sets consistently yield stable, robust predictive performance.
Table 4.
Hyperparameter tuning of the developed models.
| Models | Hyperparameter configurations |
|---|---|
| Using O_Data | |
| XGBoost | Objective = reg: squarederror, booster = gbtree, max_depth = 6, learning_rate = 0.1, colsample_bytree = 0.8, subsample = 0.8, gamma = 0.3, min_child_weight = 1, scale_pos_weight = 1, n_jobs = 1(-), random_state = 42 |
| XGBoost_AOA | Objective = reg: squarederror, booster = gbtree, max_depth = 4, learning_rate = 0.276238629, colsample_bytree = 0.784670533, subsample = 0.538532058, gamma = 0.46701301, min_child_weight = 6.52900278, n_jobs = 1(-), random_state = 42, number of agents = 10, iteration = 500 |
| XGBoost_BOA | Objective = reg: squarederror, booster = gbtree, max_depth = 4, learning_rate = 0.190043353, colsample_bytree = 0.794295262, subsample = 0.645343981, gamma = 2.403371896, min_child_weight = 7.50374444, n_jobs = 1(-), random_state = 42, number of agents = 10, iteration = 500 |
| XGBoost_WOA | Objective = reg: squarederror, booster = gbtree, max_depth = 3, learning_rate = 0.013096349, colsample_bytree = 0.65481745, subsample = 0.65481745, gamma = 0.4, min_child_weight = 1.3096349, n_jobs = 1(-), random_state = 42, number of whales = 10, iteration = 500 |
| Using OS_Data | |
| XGBoost | Objective = reg: squarederror, booster = gbtree, max_depth = 6, learning_rate = 0.1, colsample_bytree = 0.8, subsample = 0.8, gamma = 0.3, min_child_weight = 1, scale_pos_weight = 1, n_jobs = 1(-), random_state = 42 |
| XGBoost_AOA | Objective = reg: squarederror, booster = gbtree, max_depth = 8, learning_rate = 0.184956961, colsample_bytree = 0.921573544, subsample = 0.908819802, gamma = 0.34245301, min_child_weight = 9.896031782, n_jobs = 1(-), random_state = 42, number of agents = 10, iteration = 500 |
| XGBoost_BOA | Objective = reg: squarederror, booster = gbtree, max_depth = 3, learning_rate = 0.265600919, colsample_bytree = 0.864198446, subsample = 0.887591727, gamma = 0.014665179, min_child_weight = 3.9785428147, n_jobs = 1(-), random_state = 42, number of agents = 10, iteration = 500 |
| XGBoost_WOA | Objective = reg: squarederror, booster = gbtree, max_depth = 4, learning_rate = 0.143756304, colsample_bytree = 0.5, subsample = 0.5, gamma = 2.395938393, min_child_weight = 4.791876787, n_jobs = 1(-), random_state = 42, number of whales = 10, iteration = 500 |
Additionally, the convergence characteristics of the mean squared error (MSE) for each model have been analyzed and are depicted in Fig. 7. It can be noted that the XGBoost model (refer to Fig. 7a) has predicted
with the MSE of 0.09 (obtained at 34 iterations using the O_Data) and 0.01 (obtained at 59 iterations using the OS_Data). Figure 7(b) demonstrates that the XGBoost_AOA model has convergence (MSE) of 0.14 (obtained at 89 iterations using the O_Data) and 0.01 (achieved at 79 iterations using the OS_Data). On the other hand, the XGBoost_BOA model has estimated
with MSE of 0.47 (gained at 54 iterations using the O_Data) and 0.03 (obtained at 63 iterations using the OS_Data), as presented in Fig. 7(c). Conversely, the XGBoost_WOA model has gained an MSE of 0.22 at 118 iterations using the OS_Data. The XGBoost_WOA model achieved an MSE of 0.06 at 500 iterations on the O_Data (see Fig. 7c).
Fig. 7.
Illustration of convergence comparison of the (a) XGBoost, (b) XGBoost_AOA, (c) XGBoost_BOA, and (d) XGBoost_WOA models in the training phase using the O_Data and OS_Data.
The difference in convergence speed among the optimization algorithms can be attributed to their distinct search mechanisms. AOA employs arithmetic operators (addition, subtraction, multiplication, and division) with adaptive control parameters, which provide an effective balance between global exploration and local exploitation, thereby accelerating convergence toward optimal hyperparameters. Similarly, BOA benefits from cluster-based idea sharing and local refinement, which enhances population diversity while maintaining efficient exploitation. In contrast, WOA updates candidate solutions using spiral encircling and stochastic position movements that emphasize gradual search behavior. Although robust, this mechanism typically requires a larger number of iterations to sufficiently explore the search space and stabilize convergence. Consequently, WOA required up to 500 iterations in the present study, whereas AOA and BOA achieved faster convergence with fewer iterations.
Performance evaluation
The present study develops the XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models to assess the
of rock-socketed shaft. These models have been trained (= 115 datasets from the O_Data and 391 datasets from the OS_Data) and tested (= 21 datasets from the O_Data and 69 datasets from the OS_Data) by original and original+synthetic databases. The RMSE, MAE, R, VAF, PI, BF, RSR, and LMI metrics have been used to measure the performance of each model in both phases. The reasons for implementing these metrics are (a) the RMSE and MAE reveal the error sizes, (b) the performance (R) illustrates the agreement between actual and predicted values, (c) the VAF analyzes the variance, (d) various performance factors calculate the PI, showing the robustness, (e) BF metric demonstrates the model tends to overpredict or underpredict, (f) RSR facilitates the comparison of model accuracy across the various datasets, (g) LMI measures a more stable error. The combination of RMSE, MAE, R, VAF, PI, BF, RSR, and LMI metrics draws a more thorough and precise picture of a model’s performance. The combination of these metrics comprehensively evaluates model accuracy, bias, and predictive reliability. The mathematical formulation of each metric is as follows128–130:
![]() |
7 |
![]() |
8 |
![]() |
9 |
![]() |
10 |
![]() |
11 |
![]() |
12 |
![]() |
13 |
![]() |
14 |
Where
,
,
,
, and
are the actual, predicted, actual mean, predicted mean of
of rock-socketed shaft, and number of data points, respectively. The ideal values for RMSE, MAE, BF, LMI, and RSR are zero131. Conversely, the ideal R, VAF, and PI values are 1, 100, and 2, respectively. A robust and reliable soft computing model has a performance (R) between 0.8 and 1.0, which is acceptable132.
Results and discussion
Simulation of results
In this study, the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models have been developed using the O_Data and OS_Data to analyze the capabilities of each model in predicting the
of rock-socketed shaft. Table 5 summarizes the performance of each model obtained using the O_Data. Table 5 reveals that the AOA (= 99.67% and 92.95% in training and testing phase, respectively), BOA (= 96.72% and 87.82% in training and testing phase, respectively), and WOA (= 99.04% and 89.83% in training and testing phase, respectively) algorithms have enhanced the prediction accuracy of the conventional XGBoost model (= 99.67% and 82.42% in training and testing phase, respectively). The algorithms have enhanced the accuracy of conventional model reconfiguration, known as optimization, by adjusting hyperparameters to minimize the prediction error. Moreover, these algorithms have enabled the model to escape local minima. The comparison demonstrates that the XGBoost_AOA model has estimated the
of rock-socketed shaft with higher performance. The XGBoost_AOA model has attained VAF and PI of 86.36 and 1.65, respectively, comparatively higher than other models, because of (a) AOA balances both exploitation (local refinement) and exploration (global search), (b) AOA avoids local optima, (c) AOA uses arithmetic operators in a probabilistic manner105, (d) AOA enables diverse and adaptive search paths for hyperparameter tuning (e) AOA converges faster due to its straightforward update mechanism, (f) AOA maintains higher diversity in the population of candidate solutions by reducing the risk of premature convergence103,133. It can be noted that the XGBoost_AOA model has predicted the
of rock-socketed shaft with an BF of 1.0323 and 1.3701 in the training and testing phase, respectively. The testing BF demonstrates that the model may not be well-generalized due to the limited training database, i.e., 115, better than the XGBoost, XGBoost_BOA, and XGBoost_WOA models. Moreover, the XGBoost_AOA model has predicted the
of rock-socketed shaft with the least residuals in training (RMSE = 0.2240, MAE = 0.1619) and testing (RMSE = 0.9205, MAE = 0.7024) phase. Additionally, the PI, LMI, and RSR metrics have demonstrated the robustness of the XGBoost_AOA model compared to the XGBoost, XGBoost_BOA, and XGBoost_WOA models. Figure 8(a-d) illustrates the comparison of the prediction capabilities of each model. Figure 8(b) shows that the actual
of rock-socketed shaft has a good agreement with the predicted
of rock-socketed shaft, close to the ideal line.
Table 5.
Summary of performance of model (case – O_Data).
| Model | Phase | RMSE | MAE | R | VAF | PI | BF | LMI | RSR |
|---|---|---|---|---|---|---|---|---|---|
| XGBoost | Train | 0.3040 | 0.1946 | 0.9967 | 98.69 | 1.95 | 1.1053 | 0.0918 | 0.1145 |
| Test | 1.5318 | 0.8141 | 0.8242 | 65.16 | 1.20 | 1.8724 | 0.4288 | 0.6390 | |
| XGBoost_AOA | Train | 0.2240 | 0.1619 | 0.9967 | 99.29 | 1.97 | 1.0323 | 0.0763 | 0.0843 |
| Test | 0.9205 | 0.7024 | 0.9295 | 86.36 | 1.65 | 1.3701 | 0.3700 | 0.3840 | |
| XGBoost_BOA | Train | 0.6861 | 0.4594 | 0.9672 | 93.33 | 1.81 | 1.1233 | 0.2166 | 0.2583 |
| Test | 1.2663 | 0.8316 | 0.8782 | 76.80 | 1.43 | 1.7516 | 0.4380 | 0.5283 | |
| XGBoost_WOA | Train | 0.3727 | 0.2588 | 0.9904 | 98.03 | 1.93 | 1.0384 | 0.1220 | 0.1403 |
| Test | 1.2015 | 0.7449 | 0.8983 | 79.88 | 1.50 | 1.6944 | 0.3924 | 0.5013 |
Note: Bold values correspond to the optimal performance model.
Fig. 8.
Illustration of comparison of prediction capabilities of model (a) XGBoost, (b) XGBoost_AOA, (c) XGBoost_BOA, and (d) XGBoost_WOA using O_Data.
The analysis of models demonstrates that the XGBoost_AOA model has outperformed the XGBoost, XGBoost_BOA, and XGBoost_WOA models in predicting
of rock-socketed shaft. To analyze the impact of data augmentation, each model has been trained (= 391 data points) and tested (= 69 data points) by the OS_Data. The performance of each model has been evaluated and summarized in Table 6.
Table 6.
Summary of performance of model (case – OS_Data).
| Model | Phase | RMSE | MAE | R | VAF | PI | BF | LMI | RSR |
|---|---|---|---|---|---|---|---|---|---|
| XGBoost | Train | 0.1071 | 0.0549 | 0.9991 | 99.81 | 1.99 | 1.0023 | 0.0270 | 0.0438 |
| Test | 0.7582 | 0.3330 | 0.9437 | 89.04 | 1.72 | 1.0372 | 0.1904 | 0.3315 | |
| XGBoost_AOA | Train | 0.0879 | 0.0534 | 0.9994 | 99.87 | 1.99 | 1.0137 | 0.0262 | 0.0359 |
| Test | 0.3380 | 0.2133 | 0.9894 | 97.83 | 1.93 | 1.0098 | 0.1220 | 0.1478 | |
| XGBoost_BOA | Train | 0.1645 | 0.1131 | 0.9978 | 99.55 | 1.98 | 1.0091 | 0.0555 | 0.0672 |
| Test | 0.4803 | 0.3030 | 0.9795 | 95.95 | 1.88 | 1.0870 | 0.1733 | 0.2100 | |
| XGBoost_WOA | Train | 0.4740 | 0.3329 | 0.9820 | 96.25 | 1.89 | 1.0749 | 0.1634 | 0.1937 |
| Test | 0.7153 | 0.4569 | 0.9520 | 90.61 | 1.75 | 1.1181 | 0.2613 | 0.3127 |
Note: Bold values correspond to the optimal performance model.
Table 6 shows that the performance of each model improves when synthetic data points are included. The accuracy of the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models has been increased to 94.37% (from 82.42% using the O_Data), 98.94% (from 92.95% using the O_Data), 97.95% (from 96.72% using the O_Data), and 95.20% (from 89.83% using the O_Data), respectively, in testing phase. It has also been observed that the bias factor of each model has been improved in both phases. The XGBoost_AOA model has estimated the
of rock-socketed shaft with an BF of 1.0137 and 1.0098 in training and testing phase, comparatively better than other models and close to the ideal value, i.e., 1. The VAF (test = 97.83, train = 99.87) and PI (test = 1.99, train = 1.93) metrics have demonstrated the excellent performance of the XGBoost_AOA model. It has also been observed that the XGBoost_AOA model has estimated
of rock-socketed shaft with the least residuals in the training (RMSE = 0.0879, MAE = 0.0534) and testing (RMSE = 0.3380, MAE = 0.2133) phase, comparatively better than the O_Data (RMSE = 0.2240 and MAE = 0.1619 in training, and RMSE = 0.9205 and MAE = 0.7024 in testing). Figure 9 shows the comparison of the prediction capabilities of models using OS_Data. Figure 9(b) presents that the actual and predicted
of rock-socketed shaft has an excellent agreement using the XGBoost_AOA model, comparatively better than Fig. 8(b). Hence, it can be stated that the Gaussian augmentation technique is useful for enhancing the capabilities of both conventional and optimized XGBoost models.
Fig. 9.
Illustration of comparison of prediction capabilities of model (a) XGBoost, (b) XGBoost_AOA, (c) XGBoost_BOA, and (d) XGBoost_WOA using OS_Data.
Visual interpretation of results
Taylor plot
The Taylor plot is a visual comparison of multiple statistical metrics to evaluate the performance of machine learning models134. It compares actual and predicted values in terms of correlation coefficient, standard deviation, and RMSE, as shown in Fig. 10. Figure 10(a-b) shows that the XGBoost_AOA model has estimated the
of rock-socketed shaft with the standard deviation of 2.599 (in training) and 2.438 (in testing), close to the standard deviation of training (= 2.668) and testing (= 2.456) datasets of O_Data, followed by XGBoost_WOA (2.576 and 2.429 in training and testing phase, respectively). Conversely, it has been observed that the XGBoost_AOA has predicted
of rock-socketed shaft with the standard deviation of 2.432 and 2.298 in training (refer to Fig. 10(c)) and testing (refer to Fig. 10(d)) phase, respectively, close to the standard deviation of OS_Data (2.450 for training and 2.304 for testing datasets). It can be observed that the synthetic datasets have improved the prediction capabilities of the XGBoost_AOA model in both the training phase (from 0.07 to 0.02; difference in deviations) and the testing phase (from 0.02 to 0.01; difference in deviations).
Fig. 10.
Illustration of Taylor plots for the model comparison in the training ((a) for O_Data and (c) OS_Data), and testing ((b) for O_Data and (d) OS_Data) phase.
Regression error characteristics curve
The REC curve is a generalized method for the Receiver Operating Characteristic (ROC) curve. The REC curve is plotted between error tolerance and the percentage of predictions that fall within the error tolerance135. The REC curve presents the robustness of the soft computing model by computing the area over the curve (AOC = 1 – AUC; AUC is the area under the curve). The least AOC value recognizes a robust soft computing model136. However, if the REC curve is steeper (closer to the top-left), the robust soft computing model can be determined graphically. Figure 11 presents the REC curve for each model using O_Data (Fig. 11(a-b)) and OS_Data (Fig. 11(c-d)). Figure 11 (a-b) demonstrates that the XGBoost_AOA model has predicted the
of rock-socketed shaft with the AOC of 0.0426 and 0.1620 in training and testing phase, respectively, close to the ideal values. On the other hand, it has been noted that the XGBoost_AOA (developed using OS_Data) has estimated
of rock-socketed shaft with the AOC of 0.0173 and 0.0523 in the training (Fig. 11c) and testing (Fig. 11d) phases. Here, the interpretation of AOC demonstrates that the synthetic database has improved the performance of the XGBoost_AOA model. The AOC values for XGBoost also reveal that the synthetic database not only enhances the performance of the optimized model but also improves the performance of conventional models, i.e., XGBoost.
Fig. 11.
Illustration of REC plots for the model comparison using O_Data (Fig. a. Training; b.Testing) and OS_Data (Fig. c. Training; d.Testing).
Anderson Darling (AD) test
The AD test is a statistical test that analyzes the normal distribution of the actual and predicted datasets137. It is an improvement on the Kolmogorov-Smirnov test that gives the distribution’s tails more weight, increasing its sensitivity to deviations in those regions138. To determine the difference between the empirical distribution of the data and the predicted cumulative distribution of the reference distribution, the test computes a statistic. In contrast to a greater value, which implies that the data does not match the anticipated distribution, a lower value of the test statistic indicates a tighter fit. In this study, the AD test analyzes the goodness-of-fit, quality assurance, and model validation. Figure 12 presents the AD test results, including a comparison of the normal distribution plots for each model. Figure 12(a) presents that the actual database has an AD value of 5.047 and the data accepts the significance level at 95%, i.e., < 0.05. It can be noted that the XGBoost_AOA model has predicted
of rock-socketed shaft with an AD value of 4.497, followed by the XGBoost (= 4.209), XGBoost_BOA (= 3.829), and XGBoost_WOA (= 3.761) models. Moreover, Fig. 12(b) demonstrates a significant normal distribution of the predicted
of rock-socketed shaft using the XGBoost_AOA model, close to the normal distribution of actual
of rock-socketed shaft. On the other hand, Fig. 12(c) shows that the OS_Data has an AD value of 15.569 and the data accepts the statistical clause of the normal hypothesis, i.e., p < 0.05. It can be noted that the XGBoost_AOA model has an AD value of 15.109, which is comparable to the AD value of OS_Data. Furthermore, it can be stated that the synthetic database (created by the Gaussian technique) has enhanced the capabilities of conventional and optimized XGBoost models (refer to Fig. 12(d)). Finally, it can be stated that the XGBoost_AOA is a robust model to predict the
of rock-socketed shaft.
Fig. 12.
Illustration of AD test with normal distribution plot for (a-b) O_Data and (c-d) OS_Data.
Generalizability analysis
The performance comparison, Taylor plot, REC plot, and AD test have demonstrated that the XGBoost_AOA model has predicted
of rock-socketed shaft with excellent abilities. Still, it is required to analyze the generalizability of each model in predicting
of rock-socketed shaft. To solve this issue, the generalizability, called external validation, of each model has been calculated and analyzed using the following equations139:
![]() |
15 |
![]() |
16 |
![]() |
17 |
![]() |
18 |
![]() |
19 |
![]() |
20 |
![]() |
21 |
Where
is the slope of the predicted vs. actual values,
is the slope of the actual vs. predicted values,
and
denotes the coefficients of determination of the predicted versus actual values and actual versus predicted values.
and
represent the factors for estimating the predictive power of the proposed models. The ideal values for these factors are (a) 0.85 to 1.15 for k and
, (b) close to 1 for
, and
, and (c) greater than 0.5 for
and (d) less than 0.1 for
and
. Table 7 presents the results of the generalizability analysis for each model in the training and testing phases using the O_Data and OS_Data. Table 7 demonstrates that the XGBoost_AOA model has predicted
of rock-socketed shaft with a good generalizability in training (k = 1.03,
= 0.96,
= 1.00,
= 1.00,
= 0.94,
= 0.00, and
= 0.00) and testing (k = 0.95,
= 0.98,
= 0.99,
= 1.00,
= 0.55,
= 0.15, and
= 0.16) phase, comparatively better than the XGBoost, XGBoost_BOA, and XGBoost_WOA models. On the other hand, it can be noted that the synthetic database has enhanced the generalizability of both the conventional and optimized XGBoost models. Hence, the XGBoost_AOA model has demonstrated the superiority over the XGBoost, XGBoost_BOA, and XGBoost_WOA models in training (k = 1.00,
= 1.00,
= 1.00,
= 1.00,
= 0.95,
= 0.00, and
= 0.00) and testing (k = 1.00,
= 0.99,
= 1.00,
= 1.00,
= 0.84,
= 0.02, and
= 0.02) phase.
Table 7.
Summary of generalizability analysis.
| Models | Phase |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Using O_Data | ||||||||
| XGBoost | Train | 1.01 | 0.99 | 1.00 | 1.00 | 0.91 | 0.01 | 0.01 |
| Test | 0.82 | 1.03 | 0.91 | 1.00 | 0.35 | 0.34 | 0.47 | |
| XGBoost_AOA | Train | 1.03 | 0.96 | 1.00 | 1.00 | 0.94 | 0.00 | 0.00 |
| Test | 0.95 | 0.98 | 0.99 | 1.00 | 0.55 | 0.15 | 0.16 | |
| XGBoost_BOA | Train | 1.02 | 0.96 | 1.00 | 1.00 | 0.70 | 0.07 | 0.06 |
| Test | 0.87 | 1.02 | 0.95 | 1.00 | 0.44 | 0.23 | 0.30 | |
| XGBoost_WOA | Train | 1.01 | 0.98 | 1.00 | 1.00 | 0.85 | 0.02 | 0.02 |
| Test | 0.85 | 1.06 | 0.95 | 0.99 | 0.51 | 0.17 | 0.23 | |
| Using OS_Data | ||||||||
| XGBoost | Train | 1.01 | 0.99 | 1.00 | 1.00 | 0.97 | 0.00 | 0.00 |
| Test | 1.00 | 0.96 | 1.00 | 1.00 | 0.60 | 0.12 | 0.12 | |
| XGBoost_AOA | Train | 1.00 | 1.00 | 1.00 | 1.00 | 0.95 | 0.00 | 0.00 |
| Test | 1.00 | 0.99 | 1.00 | 1.00 | 0.84 | 0.02 | 0.02 | |
| XGBoost_BOA | Train | 1.00 | 1.00 | 1.00 | 1.00 | 0.93 | 0.00 | 0.00 |
| Test | 0.97 | 1.01 | 1.00 | 1.00 | 0.77 | 0.04 | 0.04 | |
| XGBoost_WOA | Train | 1.02 | 0.97 | 1.00 | 1.00 | 0.78 | 0.04 | 0.03 |
| Test | 0.96 | 1.00 | 1.00 | 1.00 | 0.63 | 0.10 | 0.10 | |
Note: Bold values correspond to the optimal performance model.
Discussion on results
This investigation compares the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models to introduce an optimal performance model to estimate the
of rock-socketed shaft. These models have been developed, trained, tested, and analyzed by O_Data and OS_Data. The comparison demonstrated that the performance of both conventional and optimized XGBoost models has been improved by adding synthetic datasets. To analyze the improvement in capabilities of each model, the percentage variation has been calculated for the RMSE, MAE, R, VAF, PI, RSR, and LMI metrics (refer to Tables 5 and 6). Figure 13 presents the impact of the synthetic database on the capabilities (+ corresponds to improvement and – shows the degradation) of each model in predicting
of rock-socketed shaft.
Fig. 13.
Illustration of the comparison of performance enhancement using synthetic datasets.
The comparison of RMSE and MAE shows that the synthetic database has reduced the residuals (both RMSE and MAE) from 27.15% to 76.03%. Moreover, the R, VAF, and PI metrics have been increased from 0.24% to 14.49%, 0.59% to 36.66%, and 1.17% to 43.11%, respectively. Moreover, the LMI and RSR metrics have been improved from 33.40% to 74.38% and 37.62% to 73.98%, respectively, by adding the 324 synthetic datasets of
of rock-socketed shaft. Still, the XGBoost_WOA model has shown a decrease in R (= 0.85%), VAF (= 1.82%), and PI (= 2.22) in the training phase by adding synthetic datasets. Also, it has been observed that the XGBoost_WOA model has achieved the least improvement in the performance due to its weaker exploitation capabilities and slower convergence speed. It becomes less effective in high-dimensional search spaces. Additionally, it suffers from premature convergence, resulting in a suboptimal hyperparameter configuration. Therefore, the performance of the XGBoost_WOA model has decreased by 0.85% (R), 1.82% (VAF), and 2.22% (PI) compared to the baseline. Overall, the XGBoost_AOA model has attained a significant performance in predicting
of rock-socketed shaft using synthetic datasets.
This investigation introduces an application of synthetic datasets in improving the capabilities of soft computing models. Still, it is essential to analyze the overfitting of the conventional and optimized XGBoost models to predict the
of rock-socketed shaft of an unseen database. To quantify model overfitting, the ratio between testing and training errors (test RMSE / training RMSE) has been examined. A ratio close to 1.0 indicates consistent learning and good generalization, whereas larger values signify variance-dominated learning and potential overfitting. In recent geotechnical machine learning studies, acceptable generalization is typically observed when this ratio remains within approximately 1.0–1.3, while ratios exceeding about 1.5 indicate noticeable overfitting behavior. Similar criteria have been adopted in foundation and rock engineering prediction problems using ANN, ensemble, and boosting models1,27. Figure 14 shows that the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models have predicted
of rock-socketed shaft with the overfitting of 5.04, 4.11, 1.85, and 1.51, respectively, using O_Data. On the other hand, the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models have predicted
of rock-socketed shaft with the overfitting of 7.08, 3.85, 2.92, and 1.51, respectively, using OS_Data. The comparison of overfitting reveals that the overfitting of the XGBoost and XGBoost_BOA models has increased by 40.53% and 58.23%, respectively, when using the synthetic datasets. Moreover, the overfitting of the XGBoost_AOA and XGBoost_WOA models has been decreased by 6.43% and 53.18%, respectively, by adding 324 synthetic datasets.
Fig. 14.
Illustration of overfitting of models and their comparison.
The performance and overfitting of the soft computing model are highly related to the feature selection. This study has utilized B, Hr, Hs, GSI, RMR,
,
, and
features to predict the
of rock-socketed shaft. The sensitivity and multicollinearity analyses have demonstrated the strength and role of each feature in predicting the
of rock-socketed shaft. To understand the overfitting issue, a relationship has been established between multicollinearity and feature sensitivity, as illustrated in Fig. 15. It has been observed that the sensitivity and multicollinearity are directly related to each other. The highly sensitive features, i.e., RMR and GSI, have problematic multicollinearity levels. Also, Fig. 5 has demonstrated that the RMR and GSI features have contributed 20.301% and 20.369% in predicting the
of rock-socketed shaft. Due to the significant contribution of RMR and GSI features, the conventional XGBoost and XGBoost_BOA models have achieved higher overfitting on OS_Data. On the other hand, the XGBoost_AOA model has predicted the
of rock-socketed shaft with a slight decrease in the overfitting using OS_Data.
Fig. 15.
Illustration of the relationship between feature sensitivity and multicollinearity.
Additionally, the Akaike information criterion (AIC) has been employed to assess the robustness of conventional and optimized XGBoost models, comparing them to identify the model with optimal performance. A soft computing model is recognized as an optimal performance if the AIC value is the least in both phases. The mathematical formula for measuring AIC is as follows140:
![]() |
22 |
Where N is the number of datasets utilized in the training and testing phase, and J is the total input variables. Figure 16 shows the comparison of AIC values for each model. It can be noted that the XGBoost_AOA model has predicted
of rock-socketed shaft with the AIC of -1376.49 (in training using O_Data), -13.91 (in testing using O_Data), -1885.61 (in training using OS_Data), and − 133.69 (in testing using OS_Data). Here, it has also been observed that the synthetic database has enhanced the robustness of models. Finally, the XGBoost_AOA model has been recognized as an optimal performance model in predicting the
of rock-socketed shaft.
Fig. 16.
Illustration of AIC results.
Analysis of models
The performance comparison, Taylor plot, REC plot, AD test, generalizability analysis, overfitting, and AIC have been analyzed to show that the synthetic datasets enhance the capabilities of conventional and optimized XGBoost models in predicting
of rock-socketed shaft. Still, it is required to analyze the reliability of each model in both cases, i.e., O_Data and OS_Data. For that purpose, the a20-index (a20), index of agreement (IOA), and index of scatter (IOS) metrics have been implemented to analyze the reliability of each model. The mathematical formulation of the a20, IOS, and IOA metrics is as follows:
![]() |
23 |
![]() |
24 |
![]() |
25 |
Where m20 is the count of the ratio of actual to predicted values, which varies from 0.8 to 1.2, and H is the total number of data points. The ideal values of IOS, IOA, and a20 are 0, 1, and 100, respectively. These indices are implemented to provide a comprehensive evaluation of model performance. The IOA metric demonstrates the degree of agreement between actual and predicted values. Conversely, the a20-index indicates the percentage of predictions within actual values, while IOS quantifies the relative scatter in predictions. Table 8 presents the results of reliability indices for each model using the O_Data and OS_Data. The reliability indices show that the synthetic datasets have increased the reliability of the conventional and optimized XGBoost models in predicting
of rock-socketed shaft. The comparison also shows that the XGBoost_AOA model has predicted
of rock-socketed shaft with higher reliability (a20 = 97.95, IOA = 0.9869, IOS = 0.0298 in the training and a20 = 86.96, IOA = 0.9390, IOS = 0.1211 in testing phase) using the OS_Data than using O_Data.
Table 8.
Summary of the results of reliability indices.
| Models | Phase | a20 | IOA | IOS | a20 | IOA | IOS |
|---|---|---|---|---|---|---|---|
| Using O_Data | Using OS_Data | ||||||
| XGBoost | Train | 83.48 | 0.9541 | 0.0951 | 99.23 | 0.9865 | 0.0364 |
| Test | 52.38 | 0.7856 | 0.6246 | 85.51 | 0.9048 | 0.2717 | |
| XGBoost_AOA | Train | 89.57 | 0.9618 | 0.0701 | 97.95 | 0.9869 | 0.0298 |
| Test | 63.33 | 0.8150 | 0.3754 | 86.96 | 0.9390 | 0.1211 | |
| XGBoost_BOA | Train | 69.57 | 0.8917 | 0.2146 | 94.88 | 0.9722 | 0.0558 |
| Test | 47.62 | 0.7810 | 0.5163 | 88.41 | 0.9134 | 0.1721 | |
| XGBoost_WOA | Train | 80.00 | 0.9390 | 0.1166 | 69.57 | 0.9183 | 0.1609 |
| Test | 52.38 | 0.8038 | 0.4899 | 57.97 | 0.8693 | 0.2563 | |
Note: Bold values correspond to the optimal performance model.
Now, the XGBoost_AOA model (trained by OS_Data) has estimated the
of 21 rock-socketed shafts (testing database of O_Data). The performance metrics reveal that the XGBoost_AOA model (trained by OS_Data) has predicted the
of 21 rock-socketed shaft with an RMSE of 0.4702, MAE of 0.3633, R of 0.9841, PI of 1.89, VAF of 96.66, BF of 1.23, LMI of 0.1914, RSR of 0.1962, a20 of 87.14, IOA of 0.9043, and IOS of 0.1917, close to the ideal values and significantly improved from test performance (RMSE = 0.9205, MAE = 0.7024, R = 0.9295, VAF = 86.36, PI = 1.65, BF = 1.37, LMI = 0.3700, RSR = 0.3840, a20 = 63.33, IOA = 0.8150, and IOS = 0.3754) of the XGBoost_AOA model using O_Data. Figure 17 presents a comparison of the confidence interval (CI) of the XGBoost_AOA model in the testing phase using O_Data and OS_Data. It has been observed that the XGBoost_AOA model has predicted
of rock-socketed shaft with CI of ± 2.5 and ± 1.5 using the O_Data and OS_Data, respectively, in the testing phase.
Fig. 17.
Illustration of (a) comparison of the confidence interval of the XGBoost_AOA model and (b) error with respect to
values for XGBoost_AOA model in testing phase using OS_Data.
Furthermore, the capabilities and robustness of the XGBoost_AOA model have been compared with the available models in the literature (refer to Table 9). The comparison shows that the XGBoost_AOA model has achieved the highest performance compared to the published models. Finally, this work introduces the XGBoost_AOA model as an optimal performance and the most reliable soft computing model in predicting
of rock-socketed shaft.
Table 9.
Comparison of the XGBoost_AOA and published models.
| S. No. | References | Features | Matrix | Model | Test R |
|---|---|---|---|---|---|
| 1 | Nawaz et al.31 |
, GSI, , B, PLS, PLR |
|
GEP | 0.8600 |
| 2 | Chen and Zhang1 |
, GSI, , B, PLS, PLR |
|
EL | 0.9418 |
| 3 | This study (using O_Data) |
, RMR, GSI, , B, Hs, Hr,
|
|
XGBoost_AOA | 0.9295 |
| 4 | This study (using OS_Data) |
, RMR, GSI, , B, Hs, Hr,
|
|
XGBoost_AOA | 0.9894 |
| 5 | This study (O to OS_Data) |
, RMR, GSI, , B, Hs, Hr,
|
|
XGBoost_AOA | 0.9841 |
Note: Bold values correspond to the optimal performance model.
SHAP analysis
SHAP Analysis is a method for explaining machine learning model predictions by calculating how much each feature contributes to a particular prediction. Based on game-theoretic concepts, SHAP values provide a unified measure of feature importance that satisfies desirable properties such as consistency and local accuracy. It demonstrates the role of features by (a) dependency, (b) summary, (c) feature importance, (d) waterfall, (e) force, and (f) heatmap plots. Dependency plots show the relationship between a specific feature and the model’s predictions while accounting for interactions with other features. The x-axis represents the feature values, while the y-axis shows the SHAP values, revealing whether increasing the feature value increases or decreases the prediction, as shown in Fig. 18. Figure 18(a) shows a strong negative trend where higher
values (above 5) are associated with negative or near-zero SHAP values, while lower
values (below 5) contribute positively to predictions. This inverse relationship suggests that lower rock material constants increase the predicted outcome. The color gradient indicates interaction with
(compressive strength), with higher
values (red/pink points) appearing at lower
values, suggesting these features interact in their influence on the model. Figure 18(b) demonstrates a clear negative exponential relationship. Very low
values (below 10) contribute strongly positive SHAP values (up to 2.5), while the effect rapidly diminishes and plateaus at negative values for
above 20. The steep decline indicates that compressive strength is highly influential at lower values but has a diminishing negative impact at higher strengths. The color coding shows interaction with
, with the relationship being consistent across different
values. Figure 18(c) exhibits a strong positive relationship with SHAP values, showing the most pronounced upward trend among all features. Lower RMR values (20–50) contribute negatively to predictions, while higher values (above 70) contribute increasingly positive SHAP values (up to 3). This exponential-like increase indicates RMR is a critical driver of the model’s predictions, with better rock mass quality substantially increasing the predicted outcome. The interaction with
appears secondary to the dominant main effect.
Fig. 18.
Illustration of the dependency plot of (a)
, (b)
, (c) RMR, (d) GSI, (e)
, (f)
, (g) B, and (h)
features.
Figure 18(d) shows minimal influence at lower values (below 60) with SHAP values near zero, but demonstrates a positive effect at higher values (above 70). The relatively small magnitude of SHAP values (max ~ 0.011) compared to other features suggests GSI has a limited direct impact on predictions, possibly because it’s highly correlated with RMR (GSI = RMR − 5). The slight uptick at higher GSI values indicates marginal additional predictive value for characterizing better quality rock masses. Figure 18(e) shows a complex non-linear relationship with an initial positive contribution around
= 2–3 m, followed by negative contributions in the middle range (around 7–10 m), and returning toward positive or neutral effects at higher values. This oscillating pattern suggests interactions with other features or threshold effects, in which certain soil layer depths have non-monotonic impacts on bearing capacity. The scattered color distribution indicates complex interactions with
values. Figure 18(f) presents a generally positive but highly variable relationship. Very low
values (near 0) show negative SHAP contributions, while the trend becomes positive and relatively stable for
above 2–3 m. The substantial scatter in the data points, particularly at lower values, suggests significant interaction effects with other features (indicated by the mi color gradient). The relationship stabilizes at moderate positive contributions for longer rock layer engagement. Figure 18(g) shows a strong negative relationship, particularly at very small diameters (below 0.2 m) where SHAP values exceed 2. The effect rapidly decreases and plateaus near zero for diameters above 0.75 m. This suggests that smaller diameter shafts have disproportionately higher predicted values, possibly reflecting a different failure mechanism or scale effect. The dramatic decrease indicates diameter is highly influential primarily at the lower end of the range. Figure 18(h) illustrates a dramatic S-shaped relationship with the steepest transition occurring between
values of 0–20. Very low bearing capacity values (near 0) contribute large negative SHAP values (below − 2), while the effect rapidly transitions to positive and plateaus around 0.5–0.6 for qu above 30. This indicates
may be both a feature and closely related to the target variable, showing the model learns the strong physical relationship between bearing capacity and the prediction outcome. The color variation suggests interactions with
throughout the range.
Conversely, the SHAP summary plot (see Fig. 19a) combines feature importance with impact directionality, displaying each feature’s distribution of SHAP values across all predictions.
shows the widest spread with predominantly blue (low values) points on the positive side and pink (high values) on the negative side, indicating that lower compressive strength increases predictions while higher values decrease them. RMR demonstrates significant variability with pink points clustered on the positive side, confirming that higher rock mass ratings substantially increase predictions. The vertical spread of points for each feature indicates the consistency of impact, with
, RMR, B, and
showing the greatest variability and therefore the most influence on model predictions across different samples. Additionally, the feature importance bar plot (see Fig. 19b) ranks features by their mean absolute SHAP values, providing a clear hierarchy of predictive power.
dominates with a value of 1.36, nearly 50% higher than the second-ranked RMR (0.95), establishing compressive strength as the single most important predictor. The top four features (
, RMR, B,
) account for the vast majority of model influence with values ranging from 0.65 to 1.36, while the bottom four features (
,
,
, GSI) contribute minimally with values below 0.10. This drop-off suggests that the model relies primarily on rock strength properties and geometric factors, with soil layer characteristics and geological indices playing secondary roles. Figure 19(c) illustrates how individual features push a specific prediction from the base value
to the final prediction
. RMR = 95 provides the largest positive contribution (+ 2.47), followed by B = 0.1 m (+ 2.23) and
= 0.75 (+ 1.49), cumulatively adding over 6 points to the prediction. Smaller positive contributions come from
= 7.575 (+ 0.56),
= 1 (+ 0.52),
= 0 (+ 0.02), and GSI = 90 (+ 0.01), while
= 9 slightly reduces the prediction (-0.08). This sequential visualization demonstrates that this particular sample achieves a high predicted value primarily due to excellent rock mass quality (high RMR), small shaft diameter, and moderate compressive strength, with other factors playing supporting roles.
Fig. 19.
Illustration of (a) summary, (b) feature importance, (c) waterfall, (d) force, and (e) heatmap plots in predicting end-bearing capacity of rock-socketed shaft.
Figure 19(d) provides a compressed horizontal view of the same waterfall information, showing how features push the prediction higher (pink/red arrows pointing right) or lower (blue arrows pointing left) from the base value of 3.231 to reach 10.53. The dominant pink section for RMR = 95 visually emphasizes its overwhelming positive contribution, followed by substantial contributions from B = 0.1 and
= 0.75. The compact format shows
= 1 and
= 7.575 at the left edge with their positive impacts, while the thin blue sliver for
= 9 indicates its minimal negative contribution. Figure 19(e) demonstrates SHAP values across multiple samples (rows) and features (columns), using color intensity to reveal patterns across the dataset. The RMR and B columns show the most dramatic color variations, with deep reds and blues, indicating these features have the largest magnitudes and vary substantially across samples.
shows a mixed pattern of blues and reds in the left portion, consistent with its non-linear inverse relationship, where low values contribute positively. The GSI,
, and
columns appear predominantly pale, reflecting their minimal overall contribution to predictions. Notably, samples in rows 5–8 show strong red coloring for RMR and B, suggesting these are cases with high predicted values driven by favorable rock mass ratings and geometric properties, while samples with more blue coloring (rows 0–3, 10–15, 20–25) likely represent lower predictions.
Development of application
Based on several analyses, the present research introduces the XGBoost_AOA model as an optimal performance model to predict the
of rock-socketed shaft. Using the configurations of the XGBoost_AOA model, a Graphical User Interface (GUI) has been developed to help the geotechnical engineers and designers estimate the
of rock-socketed shaft, as shown in Fig. 20. The GUI requires the B, Hr, Hs, GSI, RMR,
,
, and
features to predict the
of rock-socketed shaft.
Fig. 20.
Illustration of a GUI application for predicting
of rock-socketed shaft.
To examine this GUI application, two databases have been compiled from the published research by Leung and Ko141 and Abu-Hejleh et al.142. Leung and Ko141 performed centrifuge model tests on six rock-socketed shafts, having Hr and B of 3.2 m and 1.06 m, respectively, with the
of below 12 MPa. In the reported research, the pseudo-rocks were formed by mixing water, fine sand, and gypsum cement to obtain
, i.e., 8, following Hoek and Brown143. Conversely, Abu-Hejleh et al.142 performed full-scale Osterberg Cell (O-Cell) load tests on two drilled shafts, which were socketed into weak clay-stone rock. The compiled database is presented in Table 10, along with predicted values using the GUI application. The RMSE, MAE, and R have been calculated for both databases. The GUI application has predicted the
with an RMSE of 0.2210, MAE of 0.2038, and R of 0.8737 for Leung and Ko’s database. On the other hand, the application has estimated
with an RMSE of 0.2957, MAE of 0.2882, and R of 1.0000 for Abu-Hejleh et al.‘s database142. Chen and Zhang1 also utilized the same database to analyze the capabilities of an ensemble learning (EL) model by selecting different features. Chen and Zhang1 reported that the EL model predicted
with the RMSE of 0.37–0.497 (for Leung and Ko’s database141 and 0.146–0.675 (for Abu-Hejleh et al.‘s database142. Moreover, the Carter and Kulhawy144, Kulhawy and Goodman145, and Hoek-Brown146 empirical/ semi-empirical methods have predicted the
of eight rock-socketed shafts. Carter & Kulhawy144 predicted
(mentioned by
) with an RMSE of 2.87 and MAE of 2.63. Conversely, Kulhawy and Goodman145 assessed (RMSE = 2.81, MAE = 2.58, and R = 0.92) the
(mentioned by
) of rock-socketed shafts with higher performance and the least residuals than the Hoek-Brown146 semi-empirical (mentioned by
) method (RMSE = 2.88, MAE = 2.67, and R = 0.87). Still, the GUI application has predicted the
of eight rock-socketed shafts with R of 0.9853, RMSE of 0.6360, and MAE of 0.2249, comparatively better than the EL model.
Table 10.
Details of the compiled database and prediction of
.
| Source |
|
|
RMR | GSI | Hs | Hr | B |
|
|
|
|
|
GUI
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Leung and Ko136 | 8.00 | 2.10 | 68.00 | 73.00 | 0.00 | 3.20 | 1.06 | 6.51 | 3.10 | 0.30 | 0.35 | 0.22 | 3.3511 |
| 8.00 | 4.20 | 69.00 | 74.00 | 0.00 | 3.20 | 1.06 | 10.92 | 2.60 | 0.30 | 0.35 | 0.24 | 2.4517 | |
| 8.00 | 5.40 | 70.00 | 75.00 | 0.00 | 3.20 | 1.06 | 15.71 | 2.91 | 0.30 | 0.35 | 0.25 | 2.6688 | |
| 8.00 | 6.70 | 71.00 | 76.00 | 0.00 | 3.20 | 1.06 | 16.08 | 2.40 | 0.30 | 0.35 | 0.26 | 2.4374 | |
| 8.00 | 8.50 | 72.00 | 77.00 | 0.00 | 3.20 | 1.06 | 23.04 | 2.71 | 0.30 | 0.35 | 0.28 | 2.4293 | |
| 8.00 | 11.20 | 73.00 | 78.00 | 0.00 | 3.20 | 1.06 | 27.66 | 2.47 | 0.30 | 0.35 | 0.29 | 2.2062 | |
| Abu-Hejleh et al.137 | 4.00 | 1.96 | 80.00 | 85.00 | 1.90 | 5.90 | 1.07 | 11.31 | 5.77 | 0.30 | 0.45 | 0.43 | 6.1245 |
| 4.00 | 10.50 | 50.00 | 55.00 | 5.20 | 7.20 | 1.37 | 15.23 | 1.45 | 0.30 | 0.25 | 0.08 | 1.6719 |
The developed GUI has been validated using eight external case studies. These cases have been intentionally selected to represent diverse engineering conditions, including different rock mass qualities (weak to medium–strong rock formations), varying pile diameters, and different socket lengths, thereby covering a broad range of both geological and geometrical parameters. This diversity ensures that the proposed model is evaluated under realistic field variability rather than under a narrow set of conditions. In future work, additional real-world datasets from various lithologies, foundation dimensions, and construction scenarios will be collected to further verify and enhance the robustness, reliability, and generalizability of the GUI-based prediction system.
Moreover, the predicted ultimate end-bearing capacity can be converted to allowable or design values by applying code-specified global safety factors for practical design implementation. Common standards such as GB 50,007 (Code for Design of Building Foundations) and API recommendations for pile foundations typically adopt safety factors in the range of approximately 1.3–2.0, depending on geological uncertainty, construction quality, and reliability requirements. Therefore, the capacities estimated using the proposed XGBoost_AOA–GUI framework can be used in conjunction with these prescribed safety factors to ensure safe and conservative foundation design.
Summary and conclusions
This study employs the XGBoost, XGBoost_AOA, XGBoost_BOA, and XGBoost_WOA models and compares them to introduce an optimal performance model to predict the
of rock-socketed shafts. For that purpose, a database has been compiled from the literature. These models were employed (trained and tested using 115 and 21 datasets) using B, Hr, Hs, GSI, RMR,
,
, and
features of 136 rock-socketed shafts (mentioned by O_Data). Furthermore, the Gaussian noise technique was utilized to create a synthetic database. A database of 324 rock-socketed shafts was obtained after preprocessing of the synthetic database. Finally, a large database (mentioned by OS_Data) of 460 rock-socketed shafts (136 + 324) was developed to analyze the effect of a synthetic database on the conventional and optimized XGBoost models. To analyze this effect, each model was evaluated using performance comparison metrics (RMSE, AME, R, VAF, PI, BF, RSR, and LMI), Taylor plots, REC curves, AD tests, generalizability analysis, overfitting analysis, reliability analysis, and AIC. Based on the evaluation, the following conclusions are drawn:
Optimal Performance Model: This study introduces the XGBoost_AOA as an optimal performance model in estimating the
of rock-socketed shaft using the B, Hr, Hs, GSI, RMR,
,
, and
features. The XGBoost_AOA model predicted the
of rock-socketed with the highest performance (RMSE = 0.9205, MAE = 0.7024, R = 0.9295 using O_Data and RMSE = 0.3380, MAE = 0.2133, R = 0.9894 using OS_Data), and outstanding visualization in Taylor, REC, and AD plots. Also, the generalizability, overfitting, and reliability analyses demonstrated the robustness of the XGBoost_AOA model in this investigation. Moreover, the convergence plots showed that the XGBoost_WOA model required more iterations for O_Data than the XGBoost_AOA and XGBoost_BOA models because of its prolonged global exploration through encircling and spiral search mechanisms, which delays exploitation in low-dimensional, smooth optimization landscapes typical of limited datasets.Impact of Augmentation Technique – This research revealed that synthetic datasets developed using the Gaussian augmentation technique enhanced the performance of both conventional and optimized XGBoost models. The performance of the XGBoost_AOA model was significantly improved in both the training (RMSE by 60.76%, MAE by 67.04%, LMI by 65.70%, VAF by 0.59%, and PI by 1.17%) and testing (RMSE by 63.28%, MAE by 69.64%, LMI by 67.04%, VAF by 13.28%, and PI by 17.00%) phases. Moreover, it was observed that the augmentation technique enhanced the generalizability and reduced the overfitting of the XGBoost_AOA model in estimating the end-bearing capacity of rock-socketed shafts.
Effect of Feature Multicollinearity – The multicollinearity analysis revealed that the RMR and GSI features have problematic multicollinearity in estimating the
of rock-socketed shaft. In addition, it was observed that these features have a contribution (in terms of sensitivity) of 20.301% (from RMR) and 20.369% (from GSI) in estimating the
of rock-socketed shaft. Therefore, a relationship between feature multicollinearity and sensitivity was derived for the first time. It was observed that the XGBoost_AOA model overfitted in the prediction due to the RMR and GSI features. Still, a slight decrease in overfitting of the XGBoost_AOA model was observed using the synthetic datasets.SHAP Analysis – The SHAP analysis revealed that
(1.36), RMR (0.95), B (0.71), and
(0.65) dominate predictions, while GSI,
,
, and
contribute minimally. Dependency plots show complex non-linear relationships: negative exponential for σc, positive exponential for RMR, and inverse for small diameter B. Waterfall and force plots demonstrate additive feature combinations, with RMR = 95 (+ 2.47) and B = 0.1 m (+ 2.23) significantly elevating predictions. The heatmap confirms consistent patterns across samples, validating the model’s reliance on physically meaningful geotechnical parameters for bearing capacity prediction.GUI Application – This research provides a GUI application based on the XGBoost_AOA model. The testing of the GUI application was carried out using the published field datasets. It was noted that the GUI application predicted
of rock-socketed shaft with an R of 0.9853, RMSE of 0.6360, and MAE of 0.2249, which is useful for the geotechnical engineers and designers.
To sum up, this research presents the XGBoost_AOA model as an optimal performance model for estimating the end-bearing capacity of rock-socketed shafts. This study created a synthetic database using 136 real-time datasets, which is a limitation of the present investigation. This investigation may be extended by utilizing more real-time datasets to create synthetic datasets. Furthermore, the effect of synthetic datasets on the performance and overfitting of the swarm-optimized XGBoost models may be examined.
Author contributions
Jitendra Khatti : Main author, conceptualization, literature review writing and analysis, model development, manuscript preparation, methodological development, statistical analysis, detailing, editing, and overall analysis; Yewuhalashet Fissha and N.Rao Cheepurupalli: Main authors, literature review writing, formal analysis, detailing, editing, and overall analysis.
Data availability
The details of the database are provided in the manuscript. The models and codes developed for this research are available from the corresponding author upon reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jitendra Khatti, Email: jitendrakhatti197@gmail.com.
N.Rao Cheepurupalli, Email: nraocheepurupalli@gmail.com.
References
- 1.Chen, H. & Zhang, L. A machine learning-based method for predicting end-bearing capacity of rock-socketed shafts. Rock Mech. Rock Eng.55 (3), 1743–1757. 10.1007/s00603-021-02757-9 (2022). [Google Scholar]
- 2.Crapps, D. K. & Schmertmann, J. H. Compression top load reaching shaft bottom—theory vs. tests. In Deep Foundations 2002: An International Perspective on Theory, Design, Construction, and Performance. 1533–1550 (2002). https://ascelibrary.org/doi/10.1061/40601%28256%29109
- 3.Coates, D. F. Rock mechanics principles (Monograph 874, Dept. of Energy, Mines, and Resources, Mines Branch). Ottawa, Canada. (1967).
- 4.Davis, E. H. & Poulos, H. G. Pile Foundation Analysis and Design (Wiley, 1980).
- 5.Rowe, R. K. & Armitage, H. H. A design method for drilled piers in soft rock. Can. Geotech. J.24 (1), 126–142. 10.1139/t87-011 (1987). [Google Scholar]
- 6.Mandal, A.K., Sailesh, S. and Biswas, P. Design and Construction of Rock Socketed Pile Foundation for Bridges—CaseStudy on Road Project in Madhya Pradesh, India. In Proceedings of the Indian Geotechnical Conference 2019. 1, 429–449 (2021).
- 7.Zhang, L. & Einstein, H. H. End bearing capacity of drilled shafts in rock. Journal of geotechnical and geoenvironmental engineering, 124(7), 574–584 (1998). https://ascelibrary.org/doi/10.1061/%28ASCE%291090-0241%281998%29124%3A7%28574%29
- 8.Vipulanandan, C., Hussain, A. & Usluogulari, O. Parametric study of open core-hole on the behavior of drilled shafts socketed in soft rock. In Contemporary issues in deep foundations 1–10 (2007). https://ascelibrary.org/doi/abs/10.1061/40902%28221%296
- 9.Boucheloukh, A., Gong, W. & Dai, G. Prediction of the base resistance for drilled shafts socketed into rock. In Rock Mechanics and Its Applications in Civil, Mining, and Petroleum Engineering 143–153 (2014). https://ascelibrary.org/doi/10.1061/9780784413395.017
- 10.Zhang, L. Drilled Shafts in Rock: Analysis and Design (CRC, 2004).
- 11.CGS. Canadian Foundation Engineering Manual 2nd edn (Canadian Geotechnical Society, 1985).
- 12.ASSHTO. Standard Specifications for Highway Bridges 16th edn (American Association of Stata Highway and Transportation Officials, 1996).
- 13.Kulkarni, R. U. & Dewaikar, D. M. An empirical approach to assess socket friction and point resistance of axially loaded rock-socketed piles of Mumbai region. Int. J. Geotech. Eng.11 (5), 479–486. 10.1080/19386362.2016.1237607 (2017). [Google Scholar]
- 14.Zhang, L. Prediction of end-bearing capacity of rock-socketed shafts considering rock quality designation (RQD). Can. Geotech. J.47 (10), 1071–1084. 10.1139/T10-016 (2010). [Google Scholar]
- 15.Serrano, A. & Olalla, C. Ultimate bearing capacity at the tip of a pile in rock—part 1: theory. Int. J. Rock Mech. Min. Sci.39 (7), 833–846. 10.1016/S1365-1609(02)00052-7 (2002). [Google Scholar]
- 16.Galindo, R. A., Serrano, A. & Olalla, C. Ultimate bearing capacity of rock masses based on modified Mohr-Coulomb strength criterion. Int. J. Rock Mech. Min. Sci.93, 215–225. 10.1016/j.ijrmms.2016.12.017 (2017). [Google Scholar]
- 17.Serrano, A., Olalla, C. & Galindo, R. A. Ultimate bearing capacity at the tip of a pile in rock based on the modified Hoek–Brown criterion. Int. J. Rock Mech. Min. Sci.71, 83–90. 10.1016/j.ijrmms.2014.07.006 (2014). [Google Scholar]
- 18.Gharsallaoui, H., Jafari, M. & Holeyman, A. Pile end bearing capacity in rock mass using cavity expansion theory. J. Rock Mech. Geotech. Eng.12 (5), 1103–1111. 10.1016/j.jrmge.2020.03.004 (2020). [Google Scholar]
- 19.Yasufuku, N. & Hyde, A. F. L. Pile end-bearing capacity in crushable sands. Géotechnique45 (4), 663–676. 10.1680/geot.1995.45.4.663 (1995). [Google Scholar]
- 20.Vesic, A. S. On penetration resistance and bearing capacity of piles in sand. In Proceedings of the 8th international conference on soil mechanics and foundation engineering, Moscow 78–81 (1973).
- 21.Wang, Q. et al. Influence of slope topography on uplift bearing behavior of micropiles socketed in soft rocks. J. Building Eng. 112728. 10.1016/j.jobe.2025.112728 (2025).
- 22.Seo, S. et al. Leveraging data-driven machine learning techniques to enhance bearing capacity Estimation in prebored and precast piles. Expert Syst. Appl. 128070. 10.1016/j.eswa.2025.128070 (2025).
- 23.Kim, B. G., Lim, C. H. & Lee, J. K. A method for predicting base resistance of drilled shafts in disturbed rock mass. Geotech. Geol. Eng.43 (2), 72. 10.1007/s10706-024-03033-7 (2025). [Google Scholar]
- 24.Gutiérrez-Ch, J. G., Melentijevic, S., Senent, S. & Jimenez, R. DEM modelling of shaft load transfer behavior of rock-socketed piles. Comput. Geotech.181, 107149. 10.1016/j.compgeo.2025.107149 (2025). [Google Scholar]
- 25.Alzouba, A., Khoshnazar, S. & Zhang, Q. A multi-layered tree framework for predicting toe-bearing strength of rock-socketed piles. Model. Earth Syst. Environ.11 (5), 322. 10.1007/s40808-025-02517-6 (2025). [Google Scholar]
- 26.Al-Atroush, M. E. A deep learning Physics-Informed neural network (PINN) for predicting drilled shaft axial capacity. Appl. Comput. Geosci. 100246. 10.1016/j.acags.2025.100246 (2025).
- 27.Zhang, R. & Xue, X. A novel hybrid model for predicting the end–bearing capacity of rock–socketed piles. Rock Mech. Rock Eng.57 (11), 10099–10114. 10.1007/s00603-024-04094-z (2024). [Google Scholar]
- 28.You, R. & Mao, H. Assessment of ultimate bearing capacity of rock-socketed piles using hybrid approaches. Multiscale Multidisciplinary Model. Experiments Des.7 (4), 3673–3694. 10.1007/s41939-024-00425-3 (2024). [Google Scholar]
- 29.Chen, Q. Optimal regression analysis for estimating the settlement of the deep foundations socketed into rock. Multiscale Multidisciplinary Model. Experiments Des.7 (6), 5171–5186. 10.1007/s41939-024-00502-7 (2024). [Google Scholar]
- 30.Yang, X. The implementation of a least square support vector regression model for predicting the ultimate bearing capacity of rock-socketed piles. Multiscale Multidisciplinary Model. Experiments Des.7 (4), 4605–4618. 10.1007/s41939-024-00485-5 (2024). [Google Scholar]
- 31.Nawaz, M. N., Haseeb, M., Qamar, S. U., Hassan, W. & Shahzad, A. Gene expression programming-based multivariate model for Earth infrastructure: predicting ultimate bearing capacity of rock socketed shafts in layered soil-rock strata. Model. Earth Syst. Environ.10 (4), 5241–5256. 10.1007/s40808-024-02061-9 (2024). [Google Scholar]
- 32.Murali, A. K., Haque, A., Bui, H. H. & Tran, K. M. DEM modeling of the load-bearing mechanisms of rock-socketed piles with soft interface materials. J. Geotech. GeoEnviron. Eng.150 (7), 04024049. 10.1061/JGGEFK.GTENG-12279 (2024). [Google Scholar]
- 33.Mostafa, H. H. ANN model for estimating ultimate bearing capacity of piles socketed in Dubai limestone. 4th Asia-Pacific Conference on Physical Modelling in Geotechnics, 11‐13 December 2024, Abu Dhabi, UAE. (2024).
- 34.Han, G. et al. Bearing behavior of rock socketed pile in limestone stratum embedded with a karst cavity beneath pile tip. Case Studies in Construction Materials, 18, e02203 (2023). 10.1080/19648189.2023.2242902
- 35.Liu, J., Dai, G. & Gong, W. Ultimate bearing capacity of intact rocks at tip of large-diameter rock-socketed piles considering internal micro-cracks. Eur. J. Environ. Civil Eng.28 (5), 1064–1081. 10.1080/19648189.2023.2242902 (2024). [Google Scholar]
- 36.Liang, R., Yin, Z. Y., Yin, J. H., Wu, P. C. & Chen, Z. J. An enhanced micromechanical rock–pile interface model with application to rock-socketed pile modeling. Int. J. Numer. Anal. Meth. Geomech.48 (11), 2971–2995. 10.1002/nag.3759 (2024). [Google Scholar]
- 37.Zhao, X. et al. Vertical compressive bearing performance and optimization design method of large-diameter manually-excavated rock-socketed cast-in-place piles. Sci. Rep.13 (1), 14234. 10.1038/s41598-023-41483-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zeng, B. et al. Numerical simulation of horizontal bearing characteristics of cement-soil reinforced rock-socketed steel pipe monopile considering cementation damage. Comput. Geotech.162, 105626. 10.1016/j.compgeo.2023.105626 (2023). [Google Scholar]
- 39.George, M. A. & Maji, V. B. Rock-socketed piles under axial compression: A review. Indian Geotech. J.54 (2), 657–682. 10.1007/s40098-023-00790-9 (2024). [Google Scholar]
- 40.Wang, K. Y., Shen, B. & Zhao, X. L. Analysis of the bearing capacity of Rock-Socketed piles in mountain area by the load transfer theory. J. Highway Transp. Res. Dev. (English Edition). 17 (2), 18–29. 10.1061/JHTRCQ.0000864 (2023). [Google Scholar]
- 41.Millán, M. A., Picardo, A. & Galindo, R. Two-dimensional analysis of the group interaction effects between End-bearing piles embedded in a rock mass. Rock Mech. Rock Eng.56 (7), 5335–5361. 10.1007/s00603-023-03330-2 (2023). [Google Scholar]
- 42.Chen, H., Zhu, H. & Zhang, L. A three-dimensional (3D) semi-analytical solution for the ultimate end-bearing capacity of rock-socketed shafts. Rock Mech. Rock Eng.55 (2), 611–627. 10.1007/s00603-021-02710-w (2022). [Google Scholar]
- 43.Gutiérrez-Ch, J. G., Song, G., Heron, C. M., Marshall, A. & Jimenez, R. Centrifuge tests on rock-socketed piles: effect of socket roughness on shaft resistance. J. Geotech. GeoEnviron. Eng.147 (11), 04021125. 10.1061/(ASCE)GT.1943-5606.0002665 (2021). [Google Scholar]
- 44.Gutiérrez-Ch, J. G., Melentijevic, S., Senent, S. & Jimenez, R. Distinct-element method simulations of rock-socketed piles: Estimation of side shear resistance considering socket roughness. J. Geotech. GeoEnviron. Eng.146 (12), 04020133. 10.1061/(ASCE)GT.1943-5606.0002394 (2020). [Google Scholar]
- 45.Gutiérrez-Ch, J. G., Senent, S., Melentijevic, S. & Jimenez, R. A DEM-based factor to design rock-socketed piles considering socket roughness. Rock Mech. Rock Eng.54 (7), 3409–3421. 10.1007/s00603-020-02347-1 (2021). [Google Scholar]
- 46.Jeong, S., Kim, D. & Park, J. Empirical bearing capacity formula for steel pipe prebored and precast piles based on field tests. Int. J. Geomech.21 (9), 04021165. 10.1061/(ASCE)GM.1943-5622.0002112 (2021). [Google Scholar]
- 47.Wang, Q., Ma, J., Xiao, Z., Chen, W. & Ji, Y. Field test on uplift bearing capacity of rock-socketed belled piles. KSCE J. Civ. Eng.24 (8), 2353–2363. 10.1007/s12205-020-2011-0 (2020). [Google Scholar]
- 48.Barrett, J. W. & Prendergast, L. J. Empirical shaft resistance of driven piles penetrating weak rock. Rock Mech. Rock Eng.53 (12), 5531–5543. 10.1007/s00603-020-02228-7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Xing, H., Wu, J. & Luo, Y. Field tests of large-diameter rock-socketed bored piles based on the self-balanced method and their resulting load bearing characteristics. Eur. J. Environ. Civil Eng.23 (12), 1535–1549. 10.1080/19648189.2017.1359111 (2019). [Google Scholar]
- 50.Asem, P. & Gardoni, P. Evaluation of peak side resistance for rock socketed shafts in weak sedimentary rock from an extensive database of published field load tests: a limit state approach. Can. Geotech. J.56 (12), 1816–1831. 10.1139/cgj-2018-0590 (2019). [Google Scholar]
- 51.Mishra, A., Sawant, V. A. & Deshmukh, V. B. Prediction of pile capacity of socketed piles using different approaches. Geotech. Geol. Eng.37 (6), 5219–5230. 10.1007/s10706-019-00976-0 (2019). [Google Scholar]
- 52.Abi, E. et al. Calculation model of vertical bearing capacity of rock-embedded piles based on the softening of pile side friction resistance. J. Mar. Sci. Eng.11 (5), 939. 10.3390/jmse11050939 (2023). [Google Scholar]
- 53.Xu, F., Dai, G., Gong, W., Zhao, X. & Zhang, F. Lateral loading of a Rock–Socketed pile using the strain wedge model based on Hoek–Brown criterion. Appl. Sci.12 (7), 3495. 10.3390/app12073495 (2022). [Google Scholar]
- 54.Skejić, A., Gavrić, D., Jurišić, M. & Rahimić, Đ. Experimental and numerical analysis of axially loaded bored piles socketed in a conglomerate rock mass. Rock Mech. Rock Eng.55 (10), 6339–6365. 10.1007/s00603-022-02932-6 (2022). [Google Scholar]
- 55.Seol, H., Jeong, S., Cho, C. & You, K. Shear load transfer for rock-socketed drilled shafts based on borehole roughness and geological strength index (GSI). Int. J. Rock Mech. Min. Sci.45 (6), 848–861. 10.1016/j.ijrmms.2007.09.008 (2008). [Google Scholar]
- 56.Mujtaba, H., Javed, M. T., Farooq, K., Sivakugan, N. & Das, B. M. Axial capacity evaluation of rock-socketed cast-in-situ concrete piles. Soil Mech. Found. Eng.58 (2), 152–158. 10.1007/s11204-021-09720-4 (2021). [Google Scholar]
- 57.Maralapalle, V. C. & Hegde, R. Model studies on effect of pseudo-rock-socket strength on resistance of friction-only piles. Eng. Sci. Technol. Int. J.34, 101089. 10.1016/j.jestch.2021.101089 (2022). [Google Scholar]
- 58.Johnston, I. W. Revisiting methods for the design of rock socketed piles. J. Geotech. GeoEnviron. Eng.146 (12), 04020144. 10.1061/(ASCE)GT.1943-5606.0002414 (2020). [Google Scholar]
- 59.Mishra, A. & Sawant, V. A. Optimization of empirical methods in determining the load capacity of rock socketed piles. Indian Geotech. J.52 (4), 852–864. 10.1007/s40098-022-00629-9 (2022). [Google Scholar]
- 60.Edelmann, D., Móri, T. F. & Székely, G. J. On relationships between the pearson and the distance correlation coefficients. Stat. Probab. Lett.169, 108960. 10.1016/j.spl.2020.108960 (2021). [Google Scholar]
- 61.Hair, J. F., Ortinau, D. J. & Harrison, D. E. Essentials of Marketing Research Vol. 2 (McGraw-Hill/Irwin, 2010).
- 62.Smith, G. N. Probability and Statistics in Civil Engineering—An Introduction. Collins. (1986).
- 63.Cuevas, A., Febrero, M. & Fraiman, R. An Anova test for functional data. Comput. Stat. Data Anal.47 (1), 111–122. 10.1016/j.csda.2003.10.021 (2004). [Google Scholar]
- 64.Mrkvicka, T., Myllymaki, M., Jilek, M. & Hahn, U. A one-way ANOVA test for functional data with graphical interpretation. ArXiv Preprint. 10.48550/arXiv.1612.03608 (2016). arXiv:1612.03608. [Google Scholar]
- 65.Khatti, J. & Grover, K. S. Prediction of compaction parameters for fine-grained soil: critical comparison of the deep learning and standalone models. J. Rock Mech. Geotech. Eng.15 (11), 3010–3038. 10.1016/j.jrmge.2022.12.034 (2023). [Google Scholar]
- 66.Asteris, P. G., Skentou, A. D., Bardhan, A., Samui, P. & Pilakoutas, K. Predicting concrete compressive strength using hybrid ensembling of surrogate machine learning models. Cem. Concr. Res.145, 106449. 10.1016/j.cemconres.2021.106449 (2021). [Google Scholar]
- 67.Thangavel, P. & Samui, P. Determination of the size of rock fragments using RVM, GPR, and MPMR. Soils and Rocks, 45, e2022008122 (2022). 10.28927/SR.2022.008122
- 68.Lopes, R. G., Yin, D., Poole, B., Gilmer, J. & Cubuk, E. D. Improving robustness without sacrificing accuracy with patch Gaussian augmentation. ArXiv Preprint. 10.48550/arXiv.1906.02611 (2019). arXiv:1906.02611. [Google Scholar]
- 69.Qian, S. et al. An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction. Process Saf. Environ. Prot., 191, 836–851. 10.1016/j.psep.2024.08.119 (2024).
- 70.Santoni, F., De Angelis, A., Moschitta, A. & Carbone, P. Training Gaussian process regression through data augmentation for battery SOC Estimation. J. Energy Storage. 98, 113073. 10.1016/j.est.2024.113073 (2024). [Google Scholar]
- 71.Wang, Z. et al. Enhancing multi-step short-term solar radiation forecasting based on optimized generalized regularized extreme learning machine and multi-scale Gaussian data augmentation technique. Appl. Energy. 377, 124708. 10.1016/j.apenergy.2024.124708 (2025). [Google Scholar]
- 72.Abbahaddou, Y., Malliaros, F. D., Lutzeyer, J. F., Aboussalah, A. M. & Vazirgiannis, M. Gaussian Mixture Models Based Augmentation Enhances GNN Generalization. arXiv preprint arXiv:2411.08638. (2024). 10.48550/arXiv.2411.08638
- 73.Zhang, Z. et al. High embankment slope stability prediction using data augmentation and explainable ensemble learning. Computer-Aided Civ. Infrastruct. Eng.10.1111/mice.13478 (2025). [Google Scholar]
- 74.Qiu, H. et al. Advancing predictive accuracy of shallow landslide using strategic data augmentation. J. Rock Mech. Geotech. Eng.10.1016/j.jrmge.2024.09.010 (2024). [Google Scholar]
- 75.Ge, Q., Li, J., Lacasse, S., Sun, H. & Liu, Z. Data-augmented landslide displacement prediction using generative adversarial network. J. Rock Mech. Geotech. Eng.16 (10), 4017–4033. 10.1016/j.jrmge.2024.01.003 (2024). [Google Scholar]
- 76.He, B., Armaghani, D. J., Lai, S. H., Samui, P. & Mohamad, E. T. Applying data augmentation technique on blast-induced overbreak prediction: Resolving the problem of data shortage and data imbalance. Expert Syst. Appl., 237, 121616. 10.1016/j.eswa.2023.121616 (2024).
- 77.Li, H., Chen, W. & Tan, X. Back analysis of geomechanical parameters based on a data augmentation algorithm and machine learning technique. Undergr. Space, 21, 215–231. 10.1016/j.undsp.2024.08.002 (2025).
- 78.Zhao, Y. & Teng, C. Classification of soil layers in deep cement mixing using optimized random forest integrated with AB-SMOTE for imbalance data. Comput. Geotech.179, 106976. 10.1016/j.compgeo.2024.106976 (2025). [Google Scholar]
- 79.Demir, S. & Şahin, E. K. Evaluation of oversampling methods (OVER, SMOTE, and ROSE) in classifying soil liquefaction dataset based on SVM, RF, and Naïve Bayes. Avrupa Bilim Ve Teknoloji Dergisi. (34), 142–147. 10.31590/ejosat.1077867 (2022).
- 80.Li, K. et al. A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification. Bull. Eng. Geol. Environ.81 (1), 39. 10.1007/s10064-021-02523-9 (2022). [Google Scholar]
- 81.Soomro, A. A. et al. Data augmentation using SMOTE technique: application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models. Results Eng.24, 103233. 10.1016/j.rineng.2024.103233 (2024). [Google Scholar]
- 82.Wen, H. et al. Hybrid optimized RF model of seismic resilience of buildings in mountainous region based on hyperparameter tuning and SMOTE. J. Building Eng.71, 106488. 10.1016/j.jobe.2023.106488 (2023). [Google Scholar]
- 83.Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res.16, 321–357. 10.1613/jair.953 (2002). [Google Scholar]
- 84.Pan, T., Zhao, J., Wu, W. & Yang, J. Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf. Sci.512, 1214–1233. 10.1016/j.ins.2019.10.048 (2020). [Google Scholar]
- 85.Lee, S., Park, J., Kim, N., Lee, T. & Quagliato, L. Extreme gradient boosting-inspired process optimization algorithm for manufacturing engineering applications. Mater. Design. 226, 111625. 10.1016/j.matdes.2023.111625 (2023). [Google Scholar]
- 86.Dhieb, N., Ghazzai, H., Besbes, H. & Massoud, Y. September. Extreme gradient boosting machine learning algorithm for safe auto insurance operations. In 2019 IEEE international conference on vehicular electronics and safety (ICVES) 1–5 (IEEE, 2019). 10.1109/ICVES.2019.8906396
- 87.Cherif, I. L. & Kortebi, A. April. On using extreme gradient boosting (XGBoost) machine learning algorithm for home network traffic classification. In 2019 Wireless Days (WD) 1–6 (IEEE, 2019). 10.1109/WD.2019.8734193
- 88.Chang, Y. C., Chang, K. H. & Wu, G. J. Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput.73, 914–920. 10.1016/j.asoc.2018.09.029 (2018). [Google Scholar]
- 89.Pradeep, T., Samui, P., Kardani, N. & Asteris, P. G. Ensemble unit and AI techniques for prediction of rock strain. Front. Struct. Civil Eng.16 (7), 858–870. 10.1007/s11709-022-0831-3 (2022). [Google Scholar]
- 90.Bischl, B. et al. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2), e1484 (2023). 10.1002/widm.1484
- 91.Thuc, K. X., Kha, H. M., Van Cuong, N. & Van Luyen, T. A metaheuristics-based hyperparameter optimization approach to beamforming design. IEEE Access.11, 52250–52259. 10.1109/ACCESS.2023.3277625 (2023). [Google Scholar]
- 92.Yuan, H. et al. Classification forecasting research of rock burst intensity based on the BO-XGBoost-Cloud model. Earth Sci. Inf.18 (1), 95. 10.1007/s12145-024-01596-w (2025). [Google Scholar]
- 93.Lu, Z., Chen, S., Yang, J., Liu, C. & Zhao, H. Prediction of lower limb joint angles from surface electromyography using XGBoost. Expert Syst. Appl.264, 125930. 10.1016/j.eswa.2024.125930 (2025). [Google Scholar]
- 94.Kazemi, M. M. K., Nabavi, Z. & Armaghani, D. J. A novel hybrid XGBoost methodology in predicting penetration rate of rotary based on rock-mass and material properties. Arab. J. Sci. Eng.49 (4), 5225–5241. 10.1007/s13369-023-08360-0 (2024). [Google Scholar]
- 95.Geng, X. et al. An optimized Xgboost model for predicting tunneling-induced ground settlement. Geotech. Geol. Eng.42 (2), 1297–1311. 10.1007/s10706-023-02619-x (2024). [Google Scholar]
- 96.Shaik, N. B., Jongkittinarukorn, K. & Bingi, K. XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 100775 (2024). 10.1016/j.cscee.2024.100775
- 97.Sun, M. et al. Research on prediction of PPV in open-pit mine used RUN-XGBoost model. Heliyon10 (7). 10.1016/j.heliyon.2024.e28246 (2024). [DOI] [PMC free article] [PubMed]
- 98.Nabavi, Z., Mirzehi, M., Dehghani, H. & Ashtari, P. A hybrid model for back-break prediction using XGBoost machine learning and metaheuristic algorithms in Chadormalu iron mine. J. Min. Environ.14 (2), 689–712. 10.22044/jme.2023.12796.2323 (2023). [Google Scholar]
- 99.Gu, Z., Cao, M., Wang, C., Yu, N. & Qing, H. Research on mining maximum subsidence prediction based on genetic algorithm combined with XGBoost model. Sustainability14 (16), 10421. 10.3390/su141610421 (2022). [Google Scholar]
- 100.Chandrahas, N. S., Choudhary, B. S., Teja, M. V., Venkataramayya, M. S. & Prasad, N. K. XG boost algorithm to simultaneous prediction of rock fragmentation and induced ground vibration using unique blast data. Appl. Sci.12 (10), 5269. 10.3390/app12105269 (2022). [Google Scholar]
- 101.Qiu, Y. et al. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput.38 (Suppl 5), 4145–4162. 10.1007/s00366-021-01393-9 (2022). [Google Scholar]
- 102.Zhang, X. et al. Novel soft computing model for predicting blast-induced ground vibration in open-pit mines based on particle swarm optimization and XGBoost. Nat. Resour. Res.29 (2), 711–721. 10.1007/s11053-019-09492-7 (2020). [Google Scholar]
- 103.Dhal, K. G., Sasmal, B., Das, A., Ray, S. & Rai, R. A comprehensive survey on arithmetic optimization algorithm. Arch. Comput. Methods Eng.30 (5), 3379–3404. 10.1007/s11831-023-09902-3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Hu, G., Zhong, J., Du, B. & Wei, G. An enhanced hybrid arithmetic optimization algorithm for engineering applications. Comput. Methods Appl. Mech. Eng.394, 114901. 10.1016/j.cma.2022.114901 (2022). [Google Scholar]
- 105.Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M. & Gandomi, A. H. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng.376, 113609. 10.1016/j.cma.2020.113609 (2021). [Google Scholar]
- 106.Xiao, L. & Du, K. Evaluation of driven piles’ load capacity by optimization-based prediction algorithms. Int. J. Interact. Des. Manuf. (IJIDeM. 1–12. 10.1007/s12008-024-01890-3 (2024).
- 107.Wu, X., Yang, F. & Huang, S. Predicting CBR values using gaussian process regression and meta-heuristic algorithms in geotechnical engineering. Multiscale and Multidisciplinary Modeling, Experiments and Design, 1–15 (2024). 10.1007/s41939-024-00428-0
- 108.Esmaeili-Falak, M. & Benemaran, R. S. Ensemble extreme gradient boosting based models to predict the bearing capacity of micropile group. Appl. Ocean Res.151, 104149. 10.1016/j.apor.2024.104149 (2024). [Google Scholar]
- 109.Li, C. & Mei, X. Application of SVR models built with AOA and chaos mapping for predicting tunnel crown displacement induced by blasting excavation. Appl. Soft Comput.147, 110808. 10.1016/j.asoc.2023.110808 (2023). [Google Scholar]
- 110.Bacanin, N., Zivkovic, M., Bezdan, T., Cvetnic, D. & Gajic, L. Dimensionality reduction using hybrid brainstorm optimization algorithm. In Proceedings of International Conference on Data Science and Applications: ICDSA 2021, 2, 679–692 (Springer Singapore, 2022). 10.1007/978-981-16-5348-3_54
- 111.Lyu, C., Shi, Y. & Sun, L. A novel multi-task optimization algorithm based on the brainstorming process. IEEE Access.8, 217134–217149. 10.1109/ACCESS.2020.3042004 (2020). [Google Scholar]
- 112.Liang, J. J. et al. July. Multi-objective brainstorm optimization algorithm for sparse optimization. In 2018 IEEE congress on evolutionary computation (CEC)1–8 (IEEE, 2018). 10.1109/CEC.2018.8477789
- 113.Cheng, S., Qin, Q., Chen, J. & Shi, Y. Brain storm optimization algorithm: a review. Artificial Intelligence Review,46, 445–458 (2016). 10.1007/s10462-016-9471-0
- 114.Rai, R. & Dhal, K. G. Recent developments in equilibrium optimizer algorithm: its variants and applications. Arch. Comput. Methods Eng.30 (6), 3791–3844. 10.1007/s11831-023-09923-y (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Darvishpoor, S., Darvishpour, A., Escarcega, M. & Hassanalian, M. Nature-inspired algorithms from oceans to space: A comprehensive review of heuristic and meta-heuristic optimization algorithms and their potential applications in drones. Drones7 (7), 427. 10.3390/drones707042 (2023). [Google Scholar]
- 116.Nadimi-Shahraki, M. H., Zamani, H., Asghari Varzaneh, Z. & Mirjalili, S. A systematic review of the Whale optimization algorithm: theoretical foundation, improvements, and hybridizations. Arch. Comput. Methods Eng.30 (7), 4113–4159. 10.1007/s11831-023-09928-7 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Gharehchopogh, F. S. & Gholizadeh, H. A comprehensive survey: Whale optimization algorithm and its applications. Swarm Evol. Comput.48, 1–24. 10.1016/j.swevo.2019.03.004 (2019). [Google Scholar]
- 118.Nasiri, J. & Khiyabani, F. M. A Whale optimization algorithm (WOA) approach for clustering. Cogent Math. Stat.5 (1), 1483565. 10.1080/25742558.2018.1483565 (2018). [Google Scholar]
- 119.Mirjalili, S. & Lewis, A. The Whale optimization algorithm. Adv. Eng. Softw.95, 51–67. 10.1016/j.advengsoft.2016.01.008 (2016). [Google Scholar]
- 120.Dolatshahi, A. & Molladavoodi, H. Prediction of rock tensile fracture toughness: hybrid ANN-WOA model approach. Rudarsko-geološko-naftni Zbornik. 39 (3), 1–12. 10.17794/rgn.2024.3.1 (2024). [Google Scholar]
- 121.Li, Z. et al. Predicting the depth of rock cutting by abrasive water jet using support vector machine optimized with Whale optimization algorithm. Phys. Fluids. 36 (12). 10.1063/5.0245419 (2024).
- 122.Jiadong, Q., Ohl, J. P. & Tran, T. T. Predicting clay compressibility for foundation design with high reliability and safety: A geotechnical engineering perspective using artificial neural network and five metaheuristic algorithms. Reliability Engineering & System Safety, 243, 109827 (2024). 10.1016/j.ress.2023.109827
- 123.Yao, J., Nie, J. & Li, C. Research on prediction of surrounding rock deformation and optimization of construction parameters of high ground stress tunnel based on WOA-LSTM. Sci. Rep.14 (1), 27396. 10.1038/s41598-024-79059-x (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Xue, Z., Yi, X., Feng, W., Kong, L. & Wu, M. Prediction and Mapping of Soil Thickness in Alpine Canyon Regions Based on Whale Optimization Algorithm Optimized Random Forest: A Case Study of Baihetan Reservoir Area in China Vol. 191, 105667 (Computers & Geosciences, 2024). 10.1016/j.cageo.2024.105667
- 125.Su, K., Da, W., Li, M., Li, H. & Wei, J. Research on a drilling rate of penetration prediction model based on the improved chaos Whale optimization and back propagation algorithm. Geoenergy Sci. Eng.240, 213017. 10.1016/j.geoen.2024.213017 (2024). [Google Scholar]
- 126.Ni, B. et al. Debris flow volume prediction model based on back propagation neural network optimized by improved whale optimization algorithm. Plos one, 19(4), e0297380 (2024). 10.1371/journal.pone.0297380 [DOI] [PMC free article] [PubMed]
- 127.Rabbani, A., Samui, P. & Kumari, S. A novel hybrid model of augmented grey Wolf optimizer and artificial neural network for predicting shear strength of soil. Model. Earth Syst. Environ.9 (2), 2327–2347. 10.1007/s40808-022-01610-4 (2023). [Google Scholar]
- 128.Zhang, Y. et al. Optimizing Flyrock Forecasting in Open-Pit Blasting Using Hybrid Machine Learning Models 1–28 (Rock Mechanics and Rock Engineering, 2025). 10.1007/s00603-025-04730-2
- 129.Pradeep, T. & Samui, P. Prediction of rock strain using hybrid approach of ANN and optimization algorithms. Geotech. Geol. Eng.40 (9), 4617–4643. 10.1007/s10706-022-02174-x (2022). [Google Scholar]
- 130.Pradeep, T., Bardhan, A. & Samui, P. Prediction of rock strain using soft computing framework. Innovative Infrastructure Solutions. 7 (1), 37. 10.1007/s41062-021-00631-9 (2022). [Google Scholar]
- 131.Ahmad, F., Alasskar, A., Samui, P. & Asteris, P. G. Machine learning-based graphical user interface for predicting high-performance concrete compressive strength: comparative analysis of gradient boosting Machine, random Forest, and deep neural network models. Frontiers of Structural and Civil Engineering, 1–16 (2025). 10.1007/s11709-025-1201-8
- 132.Kumar, M., Samui, P. & Armaghani, D. J. A novel approach to estimate rock deformation under uniaxial compression using a machine learning technique. Bull. Eng. Geol. Environ.83 (7), 1–14. 10.1007/s10064-024-03775-x (2024). [Google Scholar]
- 133.Abualigah, L. et al. Arithmetic optimization algorithm: a review and analysis. Metaheuristic optimization algorithms, 73–87 (2024). 10.1016/B978-0-443-13925-3.00012-1
- 134.Wadoux, A. M. C., Walvoort, D. J. & Brus, D. J. An integrated approach for the evaluation of quantitative soil maps through Taylor and solar diagrams. Geoderma405, 115332. 10.1016/j.geoderma.2021.115332 (2022). [Google Scholar]
- 135.Bi, J. & Bennett, K. P. Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03) 43–50) (2003).
- 136.Mittas, N. & Angelis, L. Visual comparison of software cost Estimation models by regression error characteristic analysis. J. Syst. Softw.83 (4), 621–637. 10.1016/j.jss.2009.10.044 (2010). [Google Scholar]
- 137.Scholz, F. W. & Stephens, M. A. K-sample Anderson–Darling tests. J. Am. Stat. Assoc.82 (399), 918–924. 10.1080/01621459.1987.10478517 (1987). [Google Scholar]
- 138.Nelson, L. S. The Anderson-Darling test for normality. J. Qual. Technol.30 (3), 298–299. 10.1080/00224065.1998.11979858 (1998). [Google Scholar]
- 139.Golbraikh, A. & Tropsha, A. Beware of q2! J. Mol. Graph. Model.20 (4), 269–276. 10.1016/S1093-3263(01)00123-1 (2002). [DOI] [PubMed] [Google Scholar]
- 140.Kumar, D. R. et al. Bearing capacity of eccentrically loaded footings on rock masses using soft computing techniques. Eng. Sci.24 (2), 929. 10.30919/es929 (2023). [Google Scholar]
- 141.Leung, C. F. & Ko, H. Y. Centrifuge model study of piles socketed in soft rock. Soils Found.33 (3), 80–91. 10.3208/sandf1972.33.3_80 (1993). [Google Scholar]
- 142.Abu-Hejleh, N. M., O’Neill, M. W., Hanneman, D. & Attwooll, W. J. Improvement of the geotechnical axial design methodology for Colorado’s drilled shafts socketed in weak rocks. Transportation research record, 1936(1), 100–107 (2005). 10.1177/0361198105193600112
- 143.Hoek, E. & Brown, E. T. Practical estimates of rock mass strength. Int. J. Rock Mech. Min. Sci.34 (8), 1165–1186. 10.1016/S1365-1609(97)80069-X (1997). [Google Scholar]
- 144.Carter, J. P. & Kulhawy, F. H. Analysis and design of drilled shaft foundations socketed into rock (No. EPRI-EL-5918). Electric Power Research Inst., Palo Alto, CA (USA); Cornell Univ., Ithaca, NY (USA). Geotechnial Engineering Group. (1988). [Google Scholar]
- 145.Kulhawy, F. H. & Goodman, R. E. Foundations in Rock. 55 (Ground Engineer’s reference book, 1987).
- 146.Hoek, E., Carranza-Torres, C. & Corkum, B. Hoek-Brown failure criterion-2002 edition. Proceedings of NARMS-Tac, 1(1), 267–273 (2002).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Qian, S. et al. An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction. Process Saf. Environ. Prot., 191, 836–851. 10.1016/j.psep.2024.08.119 (2024).
- He, B., Armaghani, D. J., Lai, S. H., Samui, P. & Mohamad, E. T. Applying data augmentation technique on blast-induced overbreak prediction: Resolving the problem of data shortage and data imbalance. Expert Syst. Appl., 237, 121616. 10.1016/j.eswa.2023.121616 (2024).
- Li, H., Chen, W. & Tan, X. Back analysis of geomechanical parameters based on a data augmentation algorithm and machine learning technique. Undergr. Space, 21, 215–231. 10.1016/j.undsp.2024.08.002 (2025).
Data Availability Statement
The details of the database are provided in the manuscript. The models and codes developed for this research are available from the corresponding author upon reasonable request.






























































































