Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2024 Nov 27;19(11):e0312531. doi: 10.1371/journal.pone.0312531

The research explores the predictive capacity of the shear strength of reinforced concrete walls with different cross-sectional shapes using the XGBoost model

Hoa Thi Trinh 1, Tuan Anh Pham 1,*, Vu Dinh Tho 1, Duy Hung Nguyen 1
Editor: Afaq Ahmad2
PMCID: PMC11602087  PMID: 39602412

Abstract

Structurally, the lateral load-bearing capacity mainly depends on reinforced concrete (RC) walls. Determination of flexural strength and shear strength is mandatory when designing reinforced concrete walls. Typically, these strengths are determined through theoretical formulas and verified experimentally. However, theoretical formulas often have large errors and testing is costly and time-consuming. Therefore, this study exploits machine learning techniques, specifically the hybrid XGBoost model combined with optimization algorithms, to predict the shear strength of RC walls based on model training from available experimental results. The study used the largest database of RC walls to date, consisting of 1057 samples with various cross-sectional shapes. Bayesian optimization (BO) algorithms, including BO—Gaussian Process, BO—Random Forest, and Random Search methods, were used to refine the XGBoost model architecture. The results show that Gaussian Process emerged as the most efficient solution compared to other optimization algorithms, providing the lowest Mean Square Error and achieving a prediction R2 of 0.998 for the training set, 0.972 for the validation set and 0.984 for the test set, while BO—Random Forest and Random Search performed as well on the training and test sets as Gaussian Process but significantly worse on the validation set, specifically R2 on the validation set of BO—Random Forest and Random Search were 0.970 and 0.969 respectively over the entire dataset including all cross-sectional shapes of the RC wall. SHAP (Shapley Additive Explanations) technique was used to clarify the predictive ability of the model and the importance of input variables. Furthermore, the performance of the model was validated through comparative analysis with benchmark models and current standards. Notably, the coefficient of variation (COV %) of the XGBoost model is 13.27%, while traditional models often have COV % exceeding 50%.

1. Introduction

Reinforced concrete walls are important structural components, playing the role of supporting horizontal loads in high-rise buildings due to their large horizontal stiffness. To ensure bearing capacity, RC walls are often designed focusing on "strong shear and weak bending". Therefore, accurate prediction of damage modes, including lateral load capacity and deformation capacity of reinforced concrete walls, is necessary during the structural design process, especially in high-rise buildings, where seismic safety is of utmost importance. Although various design codes use plane section assumptions to calculate flexural strength, the calculated values are generally accurate and consistent across different codes [13]. Specifically, the bending and shear resistance of load-bearing walls are determined according to current construction standards such as ACI 318–19 and EC-2. The mechanism of flexural capacity has been thoroughly explained by flexure theory [4], while the shear provisions in the ACI code are relatively straightforward [5]. Previous studies has shown that the provisions in ACI 318–19 have a low safety factor and do not take into account high-strength concrete shear walls, while the provisions in Eurocode are too conservative [6, 7].

In addition, some modern theories are also used to determine the shear strength of reinforced concrete walls such as: Modified Compression Field Theory (MCFT) [8], the Cyclic Softened Membrane Model (CSMM) [9], and the Strut-and-Tie Model (STM) [2, 10]. Although these models provide reliable estimates, they often require the establishment of Finite Element Method (FEM) or extensive theoretical calculations, resulting in low computational performance.

Recently, Machine Learning (ML) based models have demonstrated their effectiveness in forecasting the shear strength of various structural elements such as beams [1114] and concrete columns [15]. The use of data-driven models is quite appealing due to their relative simplicity and ease of development compared to traditional models based on specific rules or hypotheses [16]. Furthermore, AI models help users reduce the burden of performing complex computational tasks. However, developing an accurate AI model also poses its own challenges. These challenges include optimizing the hyperparameters of the AI algorithm, accurately analyzing the role of input variables in predicting wall shear strength, ensuring the stability of the machine learning model, and validating the reliability of the collected data set. The purpose is to provide a simple and convenient parametric model for users to apply when designing detailed structural components. Recent studies have focused on using Machine Learning (ML) models to predict the load-bearing capacity of reinforced concrete (RC) walls, and significant findings have been published. According to Zhang et al. [17], The XBG and GB models are among the most effective models in identifying damage modes of RC walls. Specifically, the accuracy in predicting the type of damage of RC walls reaches 97%. Feng et al.[18] developed an Extreme Gradient Boosting (XGBoost) algorithm to estimate the shear strength of squat RC walls. The results demonstrated that the XGBoost model has great potential for reliably predicting shear strength, with an average prediction-to-test ratio of 1.0. Barkhordari et al. [19] also uses some Deep Neural Network models to predict the failure mode of RC walls. The research results show that the weighted average ensemble deep neural network model most accurately predicts the failure mode of RC walls, with an accuracy reaching over 0.9. Gondia et al. [20], used a genetic program to predict the shear strength of flanged squat RC walls, with a dataset of 254 samples. The results of the study revealed an explicit formula for shear strength using several mechanically guided derivatives, achieving high accuracy and demonstrating good practical applicability of the model. Keshtegar et al. [21] developed a new hybrid machine-learning model to predict the lateral strength of RC walls. Their results indicated that the hybrid artificial intelligence model, developed using an artificial neural network (ANN) optimized with an adaptive harmonic search (AHS) algorithm, achieved Outstanding performance in predicting the lateral strength of RC walls, with an average prediction-to-experiment ratio of 1.0. The combination of support vector regression (SVR) and response surface methodology (RSM) provided reasonable predictions of the lateral strength of RC walls, with an average prediction-to-experiment ratio of 0.98 [22, 23]. Mangalathu et al. [24] investigated safety margins in RC shear walls and the lack of models for rapid failure identification. They used data from 393 shear wall experiments with varied geometries to develop prediction models. Eight machine learning methods were employed, including Naive Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, XGBoost, LightGBM, and CatBoost. Random Forest achieved 86% accuracy in failure mode prediction. Critical parameters influencing failure mode included wall aspect ratio, boundary element reinforcement, and wall length-to-thickness ratio. They proposed an open-source data-driven classification model for potential design applications. Barkhordari et al. [25] presented a new hybrid model based on ANN and State-of-the-art population-based algorithms to predict the Shear strength of Squat Reinforced Concrete (SRC). This study uses data from 434 experimental specimens, of only SRC wall type to train and test the ML model and the results show that the hybrid ML model can achieve high performance in calculating the shear strength of SRC wall.

From the analysis of the results obtained by applying machine learning (ML) to predict the performance of RC walls, it is clear that ML models can predict both the failure mode (classification algorithm) and the shear strength (regression algorithm) of RC walls. However, there are some limitations, especially in predicting the shear strength of RC walls (regression algorithm). These limitations include the small number of trained models, usually 1–2 models, and the lack of diversity and generalizability in the collected data, such as considering only one cross-sectional shape or focusing on a specific type of RC wall such as low RC walls, or using only one parameter optimization algorithm for the machine learning model, without any comparison to confirm the reliability of the selected optimal parameter set. Furthermore, the model architecture is often searched manually, which does not ensure the identification of models with good parameters for RC wall research. Finally, current studies mainly evaluate the sensitivity of input variables to model performance without specific analysis of the impact of changing the values of input variables on the shear strength of RC walls.

2. Research significance

To address the limitations of previous studies, this paper proposes the following approach to apply the ML model in predicting the shear strength of reinforced concrete walls:

  1. The large dataset of RC walls collected and processed includes 1057 samples with three different cross-sectional shapes.

  2. A detailed study was performed on the XGBoost model, with the parameter sets for the XGBoost model determined through three different optimization methods.

  3. The role of input variables is evaluated using SHAP values for the XGBoost model, providing an explanation of the model’s predictive ability.

  4. The prediction ability of the XGBoost model is compared with standard design codes and existing benchmark models.

By implementing this method, the study is expected to overcome the limitations of previous studies and provide a more effective method to evaluate the shear capacity of RC walls.

3. Methodology

3.1. Data description

3.1.1. Distribution of cross-section types

Reliable data is always the most important issue for machine learning models. In this study, a dataset of 1057 RC wall test samples was collected from previous literature. Data is collected from specific, reliable sources, including: 252 samples from Massone and Melo [26], 182 samples are from Ning and Li. [27], 22 samples from Sato et al. [28], a sample from Park et al. [29], 4 samples from Teng and Chandra [30], a sample from Wolschlag et al. [31], 21 samples belonged to Wang et al. [32], 4 samples belong to Vallenas et al. [33], 129 samples from Hirosawa et al. [34], 7 specimens from Barda et al. [35], 16 test results from Antebi et al. [36] and 418 samples were collected and processed from an existing database [37]. There are a total of 1057 reinforced concrete walls in the data set, in which the ratio between cross-section types is quite uniform, specifically including 456 walls with rectangular cross-sections (Rectangular), 296 walls have barbell-shaped cross-sections (Barbell) and 305 walls have flange cross-sections (Flanged). These informations are shown in Fig 1.

Fig 1. Distribution of cross-section stypes.

Fig 1

3.1.2. Data statistics

Data statistics for the model are very important. From there, the scope of application of the research will be better understood, at the same time, data statistics also clearly show the general distribution of data, in order to evaluate the balance and reliability of the training results. The RC wall tests in this database include four groups of input variables, which are geometric dimensions, reinforcement layout, material properties, and applied loads. Specifically, the detailed input characteristics are wall height (symbol is X1, mm), wall length (symbol is X2, mm), web thickness (symbol is X3, mm), flange thickness (symbol is X4, mm), flange length (symbol is X5, mm), concrete compressive strength (symbol is X6, MPa), web reinforcement content in the vertical direction (symbol is X7, %) and the yield strength of the longitudinal web steel (symbol is X8, MPa), the web reinforcement content in the horizontal direction (symbol is X9, %) and the yield strength of the web longitudinal steel (symbol is X10, MPa), the longitudinal reinforcement content (symbol is X11, %) and the yield strength (symbol is X12, MPa), and finally the same is the axial load (symbol is X13, kN). The output is simply the shear strength of the wall (symbol is Y, kN). Descriptions and statistical properties of the variables are given in Table 1 (Examples of S1 Table). All input data are normalized to the range [0–1] to ensure features have equal importance in the machine learning model.

Table 1. Features of shear strength database for RC walls.
Features Unit Min Max Mean Standard deviation
X1 mm 145 6401 1274.79 1022.53
X2 mm 254 3960 1319.46 755.07
X3 mm 10 360 97.57 62.29
X4 mm 0 360 59.78 73.65
X5 mm 30 3045 253.69 334.05
X6 MPa 10 130.8 31.29 17.04
X7 % 0 6.24 0.8 0.72
X8 MPa 0 792 395.79 115.53
X9 % 0 3.67 0.67 0.51
X10 MPa 0 792 396.57 117.57
X11 % 0 10.58 3.17 2.01
X12 MPa 208.9 980 443.48 140.63
X13 kN 0 2429 293.86 464.31
Y kN 15.35 3138.10 563.72 639.21

The important thing to note in the input data is that, for a rectangular cross-section, X4 = 0 and X3 = X5, while the cross-section of the Barbell cross-section has a ratio X5/X4 ≤ 3, the Flange cross-section has a ratio X5/X4 >3.

Fig 2 shows the correlation matrix of the data set, which includes 13 input variables and 1 output variable. The matrix displays the correlation coefficients between each pair of variables, where a correlation value of 1 represents a perfect positive correlation, -1 represents a perfect negative correlation, and 0 represents no correlation. The correlation matrix helps us understand the relationship between different variables and how they relate to each other. Initial analysis shows that there are both positive and negative correlations between variables and that pairs of highly correlated attributes are more interdependent. Specifically, the highest correlation coefficient is 0.89 between the two variables X8 and X10, demonstrating a close relationship between these characteristics. Additionally, geometrical parameters and loads applied to the wall have the highest correlation with output performance. Understanding the correlation matrix can help determine which features are important to the resulting characterization and which features are redundant, useful for further analysis and modeling.

Fig 2. Correlation matrix of the features with data 1057samples.

Fig 2

3.2. Machine learning approaches

3.2.1. Extreme gradient boosting machine learning model (XGBoost model)

In this study, a supervised machine learning model called eXtreme Gradient Boosting (XGB) was used to determine the shear strength of reinforced concrete walls. This is one of the most powerful and popular machine learning methods, especially in prediction and classification problems. XGBoost focuses on building a sequence of weighted decision trees, also known as boosted trees, in a gradient-boosting manner. It combines multiple single decision trees to create a powerful prediction model. However, like most other decision tree-based models, the XGB model does not have the ability to extrapolate predictions, meaning the model only predicts accurately within the range of input variables used to train the model. The general formula of the XGB model is written as follows [38]:

f(x)=i=1kγi.hi(x) (1)

Where f(x) is output model; γi is the learning rate and hi (x) is the simple tree of ith iteration and k is the number of iterations.

3.2.2. Optimization algorithms for machine learning models

To improve the performance of machine learning models, robust optimization algorithms are proposed. There are many types of optimization algorithms used to solve problems. Examples include gradient-based algorithms, grid search algorithms, stochastic search, and discrete optimization such as evolutionary algorithms or particle swarms. In this study, two typical optimization algorithms are used: the Random Search algorithm [39] and the Bayesian algorithm [40]. In the Random Search algorithm, the model’s hyperparameter set is randomly selected within the search range in each iteration. This causes the algorithm to often find hyperparameter combinations better than a similar method, Grid Search. Meanwhile, Bayesian optimization differs from Random Search and Grid Search in that it takes into account past performance, while the other two methods do not take this into account. The core idea of Bayesian optimization is to build a probabilistic model of the objective function and use this model to select the most promising points for evaluation. In that sense, Bayesian Optimization first finds a set of random parameters and then evaluates the performance of this set of parameters. In the next step, the method will try to change one of the parameters and compare the model performance to see if there is any improvement. This method is especially useful in problems where the objective function is discontinuous, has no derivative, or is noisy.

3.2.3. Performance indices of the model

To evaluate the performance of the established models, statistical parameters, including Correlation coefficient (R2) [41], Root Mean Square Error (RMSE) [42], and mean absolute error (MAE) [42], were used. Accordingly, RMSE evaluates the average error between the model’s prediction results and the experimental results, then the smaller the RMSE value, the more accurate the prediction model. Meanwhile, R2 ranges from -∞ to 1, indicating a correlation between the actual value and the predicted value, meaning the higher the R2 value, the better the model. The formulas of the parameters are presented below:

Coefficient of determination (R2):

R2=1j=1N(yjyt,j)2j=1N(yjy¯)2 (2)

Root Mean Square Error (RMSE):

RMSE=1Nj=1N(yjyt,j)2 (3)

Mean Absolute Error (MAE):

MAE=1Nj=1N|yjyt,j| (4)

where yj is the actual shear strength of the jth sample in the dataset; yt,j is the predicted shear strength of the jth sample obtained from the ML model; y¯ is the mean value of the actual shear strength of the data set; N is the total number of samples in the dataset.

4. Model implementation and prediction performance

In this study, Machine Learning models are developed based on the Python Scikit-Learn library [43]. The entire data set is randomly divided into a training set that accounts for 80% of the data and a testing set that accounts for 20% of the data. The training set is used to train and fine-tune the prediction models, while the test set is used to evaluate the performance of the models. One thing to note is that model hyperparameter tuning is performed using the K-Folds cross-validation technique on the training set. This technique is intended to ensure highly generalizable results when all data will appear in the training and validation sections respectively. In this study, K = 10 is chosen, meaning that for each hyperparameter set, the training data is divided into 10 subsets, 9 are used for training and the remaining 1 is used for validation. This will be repeated 10 times and the model’s performance results will be averaged over those 10 times.

4.1. Hyperparameter optimization of the XGBoost model

To find the best machine learning model, three optimization solutions are used to automatically select the best set of hyperparameters for the XGBoost model: Bayesian with Gaussian process (BO-GP), Bayesian with Forest-based. Random (BO-RF) and Random Search (RS). There are 8 main hyperparameters of the model selected for optimization according to [6]. The optimization process will stop after the algorithm has performed at least 100 iterations, without the optimal result changing.

The values and hyperparameters for the XGBoost model were optimized within the specified range using BO-GP, BO-RF, and RS methods. The optimal hyperparameters, along with the allowable value ranges, are shown in Table 2 (S1 Data). It is important to note that to avoid overfitting during training and optimization, two techniques have been applied: (1) Subsample and (2) K-fold CV. In which, the Subsample technique uses a certain proportion of input variables during training, which helps create simpler trees and avoid overfitting. Meanwhile, the K-Fold technique is used on the training data set itself, allowing the model to be trained/validated during the optimization process, all single data fold in the training set is in turn fed into training/validation, leading to training results that avoid overfitting. The hyperparameter optimization process is shown in Fig 3. It can be seen that the optimization algorithms BO-GP and BO-RF achieve convergence in about 350 iterations. Therefore, the number of iterations of the RS algorithm is also chosen to be 350 for objective comparison.

Table 2. Hyperparameters for the XGBoost model.

Hyperparameter Meaning Range of values Optimal results
(BO-GP) (BO-RF) (RS)
’n_estimators’ Number of trees 100–1000 1000 912 823
’max_depth’ Maximum depth of each tree 3–9 3 3 3
’learning_rate’ Learning rate of stages 0.05–0.30 0.1399 0.1194 0.109
’booster’ Booster method ‘gbtree’, ‘dart’ ‘dart’ ‘gbtree’ ‘dart’
’gamma’ The minimum loss to create a tree’s nodes 0.01–0.50 0.5 0.485 0.3077
’subsample’ The subsampling ratio in the training set 0.60–0.90 0.6 0.697 0.727
’colsample_bytree’ Specifies the proportion of columns to be subsampled 0.60–0.90 0.9 0.747 0.799
’reg_lambda’ Weights used in L2 regularization 1–50 22 3 8

The optimal results show that all methods select a fairly large number of trees (from 823–1000 trees). The maximum depth of all trees is only 3 while the remaining hyperparameters are chosen differently depending on each algorithm.

Fig 3. Optimization process of BO algorithm.

Fig 3

4.2. Evaluation of hyperparameter optimization performance

The impact of optimized hyperparameters on the performance of machine learning models is evaluated by comparing their performance with default parameters. The aggregated results for the XGBoost model are illustrated in Fig 4 (S2 Data). The results indicate that hyperparameter optimization has a significant impact on the training, validation, and testing performance of the XGBoost model.

Fig 4. Comparison of the learning results of the XGBoost model between the default parameters and the optimal parameters of the BO-GP, BO-RF and RS models according to the criteria: (a)—R2; (b)—RMSE; (c)–MAE.

Fig 4

All three optimization methods provide better training performance than the model with default hyperparameters, as demonstrated by the R2, RMSE, and MAE metrics on the training, validation, and testing sets. The most pronounced changes are observed in the R2, RMSE, and MAE values, on the training, validation, and testing sets, specifically with validated R2 values of 0.956/0.972/0.970/0.969 for the default and optimized parameters by BO-GP, BO-RF, and RS. From the results obtained on the three optimization methods, it is evident that the BO-GP method is the method that gives higher model performance than the other two methods.

And it should be noted that the difference in results between the model with default hyperparameters and the optimized model is negligible on the training set, and sometimes the default hyperparameters even yield results. slightly better than the optimized model. This highlights the powerful learning capabilities of the XGBoost model and its ability to fine-tune hyperparameters to prevent overfitting when good training results are achieved.

In addition, the training results of the XGBoost model for the wall cross-section types are presented in Fig 5. Fig 5, presents a scatter plot with a regression line that visually compares the shear strength of reinforced concrete walls predicted by the optimized parameters of the XGBoost model using the Gaussian Process method. Specifically, Fig 5a shows the regression results for all cross-section types. Fig 5b–5d show the regression results for Rectangular, Barbell and Flange cross-sections, respectively. The results show that most of the regression points are close to the reference line, demonstrating the excellent performance of the XGB model. In addition, the regression results of the model for Bar-shaped wall cross-sections appear to be more accurate than other cross-section types. Specifically, the correlation coefficients on the training and test sets were R2 = 0.999 and 0.975, respectively, while the Root Mean Square Error RMSE = 10.891 kN and 93.604 kN.

Fig 5. The shear strength prediction results of the XGBoost model optimized by the Gaussian Process method for the following types of cross-sections: (a)—all section; (b)—Rectangular section; (c)—Barbell section; (d)—Flanged section.

Fig 5

5. The importance of variables to the model’s predictive ability

To evaluate the importance of input variables on the model’s predictive ability, the SHAP (Shapley Additive Interpretation) [44] values technique was used. SHAP values are a technique for interpreting the output of machine learning models. It uses a game theoretic approach to measure each player’s contribution to the outcome. The XGBoost model optimized according to the BO-GP solution is used as the main model to analyze the influence of input variables according to SHAP theory. The absolute SHARP value can be used to determine the influence of each input characteristic on the model output value. Fig 6 illustrates the results of the SHAP analysis and provides valuable insights into the influence of each input variable characteristic in forecasting the shear strength of RC walls.

Fig 6. SHAP summary plot and the relative importance of each feature of the XGBoost model.

Fig 6

Based on the results, it can be inferred that the flange length (X5) and wall length (X2) are the most important characteristics affecting the shear strength of reinforced concrete walls. More specifically, when the flanged length value (X5) increases to the maximum value of this variable (redpoint), the corresponding Shap value increases in the positive direction to more than 500. This shows that the shear strength of the wall increases significantly in proportion to the flanged length. Meanwhile, when the wall length value (X2) increases, the maximum Shap value of this variable reaches about 1200, showing that the impact of this variable on the shear strength of the wall is even greater, that is, the longer the wall length, the higher the shear strength value. Additionally, the axial load (X13) is also significant for the shear strength of RC walls, but it has been overlooked in the prediction equations by ACI 318–19 (ACI 2019) and Wood (1990).

Interestingly, the influence of longitudinal reinforcement (X11) is greater than that of horizontal and vertical reinforcement (X9 and X7) in predicting shear strength. Notably, horizontal reinforcement primarily acts as ties, enhancing cohesion and preventing instability of the vertical bars. However, based on the results of the analysis in this study, other features, such as the vertical reinforcement content and the intrinsic shear capacity of the concrete, show a more significant contribution than the horizontal reinforcement in forming the shear resistance of RC walls. This observation does not diminish the essential role of transverse reinforcement in RC walls but highlights its relative influence compared to other factors, in this data set.

6. Compare the performance of the XGBoost model with current design codes

To evaluate the performance of the ML model, three semi-empirical shear strength determination models based on mechanical theory are used for comparison to evaluate the prediction accuracy of the XGBoost model. These are the models provided in ACI 318–19 (Chapter 11) (ACI 2019) [45, p. 14], ASCE/SEI 43–05 (ASCE 2005) [46], and by Wood (1990) [47], is given as follows:

  • ACI 318–19:
    Vn=αcλfck+ρhfyhAcv0.83fckAcw (5)
    Where: αc is the aspect ratio coefficient, αc = 0.25 when hw/lw ≤ 1.5; αc = 0.17 when hw/lw ≥ 2.0 and changes linearly between 0.25 and 0.17 for hw/lw between 1.5 and 2.0 (wall length–lw, high wall–hw); λ is a coefficient of variation that reflects the properties of concrete and is equal to 1.0 for normal strength concrete; Acv is the total area of the concrete limited by the thickness of the web and the length of the section in the direction of the considered shear force; and Acw is the total cross-sectional area of the wall;
  • ASCE/SEI 43–05:
    Vn=vndtw (6)
    vn=0.69fck0.28fckhwlw0.5+P4lwtw+ρsefyh1.66fck (7)
    ρse=Aρv+Bρh (8)
    Where d = 0.6lw; ρse is the equivalent reinforcing ratio combining ρh and ρv with coefficients A = 1; B = 0 for hw/lw ≤ 0.5; A = − hw/lw + 1.5; B = hw/lw − 0.5 for 0.5 ≤ hw/lw ≤ 1.5; and A = 0; B = 1 for hw/lw ≥ 1.5.
  • Wood (1990):
    0.5fckAcvVn=Avffyv40.83fckAcv (9)

    Where Avf is the total area of shear reinforcement, reinforcement is arranged along the height of the wall to improve shear strength; fck is the compressive strength of concrete, fyv—Strength of vertical reinforcement in the web, Acw is the total cross-sectional area of the wall.

The three standard models and the XGBoost model were used to predict the shear strength of walls for a dataset consisting of 1,057 samples, which included three different cross-section types and a larger height-to-width aspect ratio of 1.5. The comparison results between the XGBoost model and the standard models, as well as the practice code reference models, are presented in and Fig 7 (S3 Data).

Fig 7. Results of predicting shear resistance using mechanical model and XGBoost model.

Fig 7

Comparisons are shown between the predicted shear strength ratios and the experimentally obtained shear strengths for the samples in the database. The model evaluations include the mean standard deviation (St.D), mean value (Mean), maximum value (Max.), minimum value (Min.), and coefficient of variation (COV%) of the ratio between the predictions and the experimental results.

The results from Fig 7 and Table 3 indicate that the prediction ratio of shear strength for walls between models and experiments, according to current standard models, shows high average errors and large variability. The XGBoost model is optimized according to the BO-GP solution for excellent performance with a high Predicted-to-experiment ratio and lowest standard deviation. That demonstrates the accuracy and stability of the XGBoost model in predicting the shear strength of RC walls.

Table 3. Performance comparison between the ML model and the experimental and predictive scale-mechanical model.

Models Predicted to experiment ratio (Vpredicted/Vexp)
Min. Max. Mean St.D. COV (%)
ACI 318–19 0.151 4.742 0.980 0.614 62.64
ASCE/SEI 43–05 0.325 4.953 1.173 0.647 55.19
Wood 1990 0.166 4.830 0.764 0.506 66.15
XGBoost model 0.244 2.022 1.013 0.134 13.27

Furthermore, the analytical results comparing the predicted wall shear strength ratio with the experimental values on different specimens, shown in Fig 8 (S4 Data), show the mean predicted value and the standard deviation (St.D) prediction range of the model. The analysis shows that walls with lower hw/lw ratios (≤1.0) exhibit greater variation in prediction accuracy than walls with higher hw/lw ratios (ranging from 1.0 to 3.5), indicating less consistency in predictions for smaller aspect ratios. This variation may be the result of complex interactions associated with lower hw/lw ratios, which are less effectively captured by the model, while higher ratios simplify predictions due to more uniform geometric and structural features. These findings emphasize the importance of taking hw/lw ratio into account in shear wall design to improve prediction reliability and ensure safety. Notably, the XGBoost model outperforms conventional semi-empirical models, demonstrating superior predictive performance on both low aspect ratio walls with hw/lw ≤1.5 and high aspect ratio walls with hw/lw >1.5. The robustness and flexibility of the model make it an effective tool for predicting shear strength in a variety of wall configurations.

Fig 8. Shear strength predicted by mechanics—Based models: a) ACI 318–19; b) ASCE/SEI 43–05; c) Wood 1990; and d) XGBoost model according to different aspect ratios (hw/lw ratios >1.5, = 1.5, <1.5).

Fig 8

7. Conclusions

The study utilized the XGBoost model to analyze a dataset comprising 1057 RC wall samples with various cross-section types and aspect ratios (ratios >1.5, = 1.5, <1.5). Key findings from this research are summarized as follows:

The optimization of the XGBoost model using Bayesian GP, RF, and RS methods demonstrated the importance of hyperparameter tuning compared to default hyperparameters. All three hyperparameter optimization models significantly improved performance over the default model, with the GP method providing the best results. The XGBoost model, optimized using the BO-GP method, achieved stable prediction performance across all cross-section types and the combined dataset of the three cross-section types, with R2 scores on the test set of 0.984/0.913/0.975/0.964 for all sections, Rectangular sections, Barbell sections, and Flanged sections, respectively.

The SHAP value technique is used to explain the predictive ability of the XGBoost model and analyze the role of input variables for all types of RC wall cross-sections. The results show that the two input factors, flange length (bf) and wall length (lw), are the most important characteristics affecting the shear strength of RC walls, for this dataset.

The optimized XGBoost model was also compared with existing standards and codes. The results demonstrated that the XGBoost model significantly improved the predictive performance compared with traditional design standards such as ACI 318–19, ASCE/SEI 43–05, and Wood 1990. Furthermore, the study results showed that the XGBoost model was capable of effectively predicting shear strength within the range of aspect ratios hw/lw >1.5. These findings highlight the robustness of the XGBoost model in accurately predicting the shear strength of reinforced concrete walls, emphasizing the advantages of advanced machine learning techniques over traditional design methods. However, it should be noted that the XGBoost model does not have extrapolation capabilities, so the model’s accuracy is only guaranteed within the range of input variable values ​​it was trained on. This can be improved by using a more general training dataset and using machine learning models that are not limited in extrapolation capabilities.

Supporting information

S1 Data. Data from Fig 3.

(CSV)

pone.0312531.s001.csv (12.4KB, csv)
S2 Data. Data from Fig 4.

(CSV)

pone.0312531.s002.csv (470B, csv)
S3 Data. Data from Fig 7.

(CSV)

pone.0312531.s003.csv (50.1KB, csv)
S4 Data. Data from Fig 8.

(CSV)

pone.0312531.s004.csv (64.8KB, csv)
S1 Table. Example of data used in Table 1.

(CSV)

pone.0312531.s005.csv (30.9KB, csv)

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.CMC, Technical Specification for Concrete Structures of Tall Building (JGJ 3–2010), China Ministry of Construction, Beijing, 2010 (in Chinese).
  • 2.“ACI (American Concrete Institute). 2014. Building code requirements for structural concrete. ACI Committee 318. Farmington Hills, MI: ACI.”
  • 3.“Eurocode 8, Design of Structures for Earthquake Resistance—Part 1: General Rules, Seismic Actions and Rules for Buildings, CEN, Brussels, 2004, 1998–1.”
  • 4.J. Chandra, K. Chanthabouala, and S. Teng, “Truss model for shear strength of structural concrete walls,” PhD Thesis, Petra Christian University, 2018.
  • 5.Tran T., Motter C., Segura C., and Wallace J., “Strength and deformation capacity of shear walls,” in Proceedings, 2017. [Google Scholar]
  • 6.S. E. Oussadou, “Comparison of the Provisions of ACI318-19 Code and Eurocode on the Structural Design and Cost Analysis, of a High-Rise Concrete Building Subjected to Seismic & Wind Forces,” PhD Thesis, The British University in Dubai, 2021.
  • 7.Wahab S. et al., “Comparative Analysis of Shear Strength Prediction Models for Reinforced Concrete Slab—Column Connections,” Advances in Civil Engineering, vol. 2024, p. 1784088, Mar. 2024, doi: 10.1155/2024/1784088 [DOI] [Google Scholar]
  • 8.Vecchio F. J. and Collins M. P., “The modified compression-field theory for reinforced concrete elements subjected to shear.,” ACI J., vol. 83, no. 2, pp. 219–231, 1986. [Google Scholar]
  • 9.Hsu T. T. and Zhu R. R., “Softened membrane model for reinforced concrete elements in shear,” Structural Journal, vol. 99, no. 4, pp. 460–469, 2002. [Google Scholar]
  • 10.Hwang S.-J. and Lee H.-J., “Strength prediction for discontinuity regions by softened strut-and-tie model,” Journal of Structural Engineering, vol. 128, no. 12, pp. 1519–1526, 2002. [Google Scholar]
  • 11.Chou J.-S., Ngo N.-T., and Pham A.-D., “Shear strength prediction in reinforced concrete deep beams using nature-inspired metaheuristic support vector regression,” Journal of Computing in Civil Engineering, vol. 30, no. 1, p. 04015002, 2016. [Google Scholar]
  • 12.“Le Nguyen K., Thi Trinh H., Nguyen T.T., Nguyen H.D., Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications, Expert Systems with Applications 230 (2023) 120649. doi: 10.1016/j.eswa.2023.120649” [DOI] [Google Scholar]
  • 13.Cavaleri L., Barkhordari M. S., Repapis C. C., Armaghani D. J., Ulrikh D. V., and Asteris P. G., “Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete,” Construction and Building Materials, vol. 359, p. 129504, 2022. [Google Scholar]
  • 14.Asteris P. G., Armaghani D. J., Hatzigeorgiou G. D., Karayannis C. G., and Pilakoutas K., “Predicting the shear strength of reinforced concrete beams using Artificial Neural Networks,” Computers and Concrete, An International Journal, vol. 24, no. 5, pp. 469–488, 2019. [Google Scholar]
  • 15.Azadi Kakavand M. R., Sezen H., and Taciroglu E., “Data-driven models for predicting the shear strength of rectangular and circular reinforced concrete columns,” Journal of Structural Engineering, vol. 147, no. 1, p. 04020301, 2021. [Google Scholar]
  • 16.Halevy A., Norvig P., and Pereira F., “The unreasonable effectiveness of data,” IEEE intelligent systems, vol. 24, no. 2, pp. 8–12, 2009. [Google Scholar]
  • 17.Zhang H., Cheng X., Li Y., and Du X., “Prediction of failure modes, strength, and deformation capacity of RC shear walls through machine learning,” Journal of Building Engineering, vol. 50, p. 104145, 2022. [Google Scholar]
  • 18.Feng D.-C., Wang W.-J., Mangalathu S., and Taciroglu E., “Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls,” Journal of Structural Engineering, vol. 147, no. 11, p. 04021173, 2021. [Google Scholar]
  • 19.Barkhordari M. S. and Massone L. M., “Failure mode detection of reinforced concrete shear walls using ensemble deep neural networks,” International Journal of Concrete Structures and Materials, vol. 16, no. 1, p. 33, 2022. [Google Scholar]
  • 20.Gondia A., Ezzeldin M., and El-Dakhakhni W., “Mechanics-guided genetic programming expression for shear-strength prediction of squat reinforced concrete walls with boundary elements,” Journal of Structural Engineering, vol. 146, no. 11, p. 04020223, 2020. [Google Scholar]
  • 21.Keshtegar B., Nehdi M. L., Kolahchi R., Trung N.-T., and Bagheri M., “Novel hybrid machine leaning model for predicting shear strength of reinforced concrete shear walls,” Engineering with Computers, pp. 1–12, 2022. [Google Scholar]
  • 22.Keshtegar B., Nehdi M. L., Trung N.-T., and Kolahchi R., “Predicting load capacity of shear walls using SVR—RSM model,” Applied Soft Computing, vol. 112, p. 107739, 2021. [Google Scholar]
  • 23.Le-Nguyen K., Trinh H., Banihashemi S., and Pham T., “Machine Learning Approaches for Lateral Strength Estimation in Squat Shear Walls: A Comparative Study and Practical Implications,” Expert Systems with Applications, vol. 239, p. 122458, Nov. 2023, doi: 10.1016/j.eswa.2023.122458 [DOI] [Google Scholar]
  • 24.“Mangalathu S., Jang H., Hwang S.H., Jeon J.S., Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls, Eng.Struct.208 (2020) 110331. doi: https%3Adoi.org/10.1016/j.engstruct.2020.110331” [Google Scholar]
  • 25.Barkhordari M. S. and Massone L. M., “Ensemble techniques and hybrid intelligence algorithms for shear strength prediction of squat reinforced concrete walls,” 1, vol. 8, no. 1, p. 37, 2023. [Google Scholar]
  • 26.Massone L. M. and Melo F., “General solution for shear strength estimate of RC elements based on panel response,” Engineering Structures, vol. 172, pp. 239–252, 2018. [Google Scholar]
  • 27.Ning C.-L. and Li B., “Probabilistic development of shear strength model for reinforced concrete squat walls,” Earthquake Engineering & Structural Dynamics, vol. 46, no. 6, pp. 877–897, 2017. [Google Scholar]
  • 28.S. Sato et al., “Behavior of Shear Wall Using Various Yield Strength of Rebar Part 1: An Experimental Study,” 1989.
  • 29.Baek J.-W., Park H.-G., Shin H.-M., and Yim S.-J., “Cyclic loading test for reinforced concrete walls (Aspect Ratio 2.0) with grade 550 MPa (80 ksi) shear reinforcing bars,” ACI Structural Journal, vol. 114, no. 3, p. 673, 2017. [Google Scholar]
  • 30.“Teng, S., and Chandra, J. (2016), ‘Cyclic Shear Behavior of High-Strength Concrete Structural Walls.’ ACI Structural Journal, V. 113, No. 6, November-December 2016, pp. 1335–1345.”
  • 31.“Wolschlag, C. (1993). Experimental investigation of the response of R/C structural walls subjected to static and dynamic loading. University of Illinois at Urbana-Champaign.”
  • 32.“Wang, T.Y., Bertero, V.V. and Popov, E.P. (1975)."Hysteretic Behaviour of Reinforced Concrete Framed Walls."Earthquake Engineering Research Center, University of California, Berkeley, CA.”
  • 33.“Vallenas, J.M., Bertero, V.V. and Popov, E.P. (1979). ‘Hysteretic Behaviour of Reinforced Concrete Structural Walls.’ Earthquake Engineering Research Center, University of California, Berkeley, CA.”
  • 34.Hirosawa M., “Past experimental results on reinforced concrete shear walls and analysis on them,” Kenchiku Kenkyu Shiryo, vol. 6, pp. 33–34, 1975. [Google Scholar]
  • 35.F. Barda, Shear strength of low-rise walls with boundary elements. Lehigh University, 1972.
  • 36.J. Antebi, S. Utku, and R. J. Hansen, The response of shear walls to dynamic loads. Massachusetts Institute of Technology, Department of Civil and Sanitary …, 1960.
  • 37.Carlos E. Ospina, Gerd Birkle, Widianto Widianto, Sudheera R. Fernando, Sumudinie Fernando, Ann Christine Catlin, et al, “ACI 445/fib WG 2.2.3 Punching Shear Test Database 1.” https://datacenterhub.org/deedsdv/publications/view/436, 2019. [Online]. https://datacenterhub.org/deedsdv/publications/view/436
  • 38.Friedman J. H., “Greedy function approximation: a gradient boosting machine,” Ann. Statist., vol. 29, no. 5, pp. 1189–1232, Oct. 2001, doi: 10.1214/aos/1013203451 [DOI] [Google Scholar]
  • 39.Pham T. A., Thi Vu H.-L., and Thi Duong H.-A., “Improving deep neural network using hyper-parameters tuning in predicting the bearing capacity of shallow foundations,” Journal of Applied Science and Engineering, vol. 25, no. 2, pp. 261–273, Sep. 2021, doi: 10.6180/jase.202204_25(2).0012 [DOI] [Google Scholar]
  • 40.J. Mockus, “The Bayesian approach to global optimization,” in System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31–September 4, 1981, Springer, 2005, pp. 473–481.
  • 41.Saunders L. J., Russell R. A., and Crabb D. P., “The coefficient of determination: what determines a useful R2 statistic?,” Investigative ophthalmology & visual science, vol. 53, no. 11, pp. 6830–6832, 2012. [DOI] [PubMed] [Google Scholar]
  • 42.Hodson T. O., “Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not,” Geoscientific Model Development Discussions, vol. 2022, pp. 1–10, 2022. [Google Scholar]
  • 43.Pedregosa F. et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [Google Scholar]
  • 44.Dresher M., Karlin S., and Shapley L. S., “Polynomial games,” Contributions to the Theory of Games I, vol. 24, pp. 161–180, 1950. [Google Scholar]
  • 45.“ACI 318–14. Building Code Requirements for Structural Concrete (ACI 318–14) and Commentary, American Concrete Institute, Farmington Hills (MI), 2014.”
  • 46.Dar A., Konstantinidis D., and El-Dakhakhni W. W., “Evaluation of ASCE 43–05 Seismic Design Criteria for Rocking Objects in Nuclear Facilities,” Journal of Structural Engineering, vol. 142, no. 11, p. 04016110, 2016. [Google Scholar]
  • 47.Wood S. L., “Shear strength of low-rise reinforced concrete walls,” Structural Journal, vol. 87, no. 1, pp. 99–107, 1990. [Google Scholar]

Decision Letter 0

Afaq Ahmad

14 Aug 2024

PONE-D-24-29804The research explores the predictive capacity of the shear strength of reinforced concrete walls with different cross-sectional shapes using the XGBoost modelPLOS ONE

Dear Dr. Pham,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 28 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Afaq Ahmad, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.  

The American Journal Experts (AJE) (https://www.aje.com/) is one such service that has extensive experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. Please note that having the manuscript copyedited by AJE or any other editing services does not guarantee selection for peer review or acceptance for publication. 

Upon resubmission, please provide the following: 

● The name of the colleague or the details of the professional service that edited your manuscript

● A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

● A clean copy of the edited manuscript (uploaded as the new *manuscript* file)

3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. We note that your Data Availability Statement is currently as follows: All relevant data are within the manuscript and its Supporting Information files

Please confirm at this time whether or not your submission contains all raw data required to replicate the results of your study. Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods (https://journals.plos.org/plosone/s/data-availability#loc-minimal-data-set-definition).

For example, authors should submit the following data:

- The values behind the means, standard deviations and other measures reported;

- The values used to build graphs;

- The points extracted from images for analysis.

Authors do not need to submit their entire data set if only a portion of the data was used in the reported study.

If your submission does not contain these data, please either upload them as Supporting Information files or deposit them to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories.

If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. If data are owned by a third party, please indicate how others may request data access.

5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Author needs to provide the description of some critical parts of the study.

1- Abstract could be more informative by providing results. I prefer to see some results in the abstract.

2- The introduction needs to be more emphasized on the research work with a detailed explanation of the whole process considering past, present and future scope. How the present study gives more accurate results than previous studies? It needs to be strengthened in terms of recent research in this area with possible research gaps. It is strongly recommended to add a recent literature.

3- Please avoid the basic details about the methodology in the introduction portion, the introduction portion, please use only the latest reference. Please reduce these sections.

4- Please describe the important and novelty of the selected problem, data details. Please provide details about the selected problem. Please include the validation process on the unique problem.

5- In the conclusion section, the limitations of this study, suggested improvements of this work, and future directions should be added

The author needs to address the abovementioned points for the betterment of the manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors presents a well-written and structured manuscript on the shear strength estimation of reinforced concrete walls. Specifically, In this study, machine learning techniques such as XGBoost models are trained and the optimal among them is proposed to predict the shear strength of reinforced concrete walls with different cross-sectional shapes. The conclusions are well-supported by the quantity and quality of the data and results. However, some minor comments and suggestions are provided to aid in further enhancing the manuscript's overall quality.

I. The abstract should be concise, focusing on the methodology employed and the key findings. A clear and succinct summary will enhance the paper's readability and impact.

II. Consider adding a section after the introduction titled "Research Significance" to emphasize the need for further research on this subject and highlight the novelty and innovation of your work. This will provide context and motivation for your study.

III.

IV. In order to assess the reliability of soft computing models, researchers often use statistical indices such as MSE. The authors are encouraged to include a recently proposed performance index, the a20 index (see the last equation in Table 8 of https://doi.org/10.1016/j.ultras.2024.107347 and eq. 11 in s00521-021-06004-8).

V. To bolster their results, the authors are urged to include in the database used for model training both the experimental ‘true’ values and those predicted by the optimal proposed model, along with corresponding publication details.

VI. Transitions between sections should be smoother to enhance readability.

VII. The conclusion section should be modified and improved.

VIII. The literature review presented in the manuscript is not sufficiently comprehensive. The authors are encouraged to refer to extensive state-of-the-art reports in soft computing techniques such as ‘Predicting the shear strength of reinforced concrete beams using Artificial Neural Networks,’ ‘Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete,’ ‘Predicting the thermal conductivity of soils using integrated approach of ANN and PSO with adaptive and time-varying acceleration coefficients,’ ‘Analysis and prediction of the effect of Nanosilica on the compressive strength of concrete with different mix proportions and specimen sizes using various numerical approaches,’ ‘Predicting the unconfined compressive strength of granite using only two non-destructive test indexes,’ ‘Genetic prediction of ICU hospitalization and mortality in COVID‐19 patients using artificial neural networks,’ and ‘Developing bearing capacity model for geogrid-reinforced stone columns improved soft clay utilizing MARS-EBS hybrid method.’. Detailed and in depth state-of-the-art report can be found in https://doi.org/10.1016/j.jobe.2023.108369

IX. How do you address the potential overfitting of the hybrid model? A short paragraph about this crucial issue will also add more value in their work.

X. Adding a short section on the limitations of the proposed models, titled 'Limitations and Future Work,' will enhance the value of the submitted work.

XI. A thorough proofreading is essential to address typos and language errors. Improving the manuscript's English usage will enhance its overall readability and professionalism.

Addressing these specific comments will significantly enhance the quality and suitability of your manuscript for publication in an international journal.

Reviewer #2: please

1- add correlation matrix of inputs

2- add Ref for equation 2 and 3

3- add criteria for stopping optimization

4- add some ref to support the results of the section 4:

The analysis results show clearly, the flange length (X5) and wall length (X2) has the

greatest influence on the shear strength value of RC walls. This proves that, in addition to

the wall body, the flange also significantly participates in the wall's shear capacity.

- cite recent related paper like what follows to enrich introduction and compare your paper with them and the gap you cover:

"Ensemble techniques and hybrid intelligence algorithms for shear strength prediction of squat reinforced concrete walls." 1, 8(1), 37. https://doi.org/10.12989/acd.2023.8.1.039

"Failure mode detection of reinforced concrete shear walls using ensemble deep neural networks." International Journal of Concrete Structures and Materials 16, no. 1 (2022): 33. https://doi.org/10.1186/s40069-022-00522-y

Response estimation of reinforced concrete shear walls using artificial neural network and simulated annealing algorithm. In Structures (Vol. 34, pp. 1155-1168). Elsevier.https://www.sciencedirect.com/science/article/pii/S2352012421007657

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Nov 27;19(11):e0312531. doi: 10.1371/journal.pone.0312531.r002

Author response to Decision Letter 0


25 Sep 2024

RESPONSES OF THE REVIEWER’S COMMENTS

Dear Editor and reviewers, we thank for your nice and constructive comments, which help us to improve the quality of our work.

Responses to Editor:

1- Abstract could be more informative by providing results. I prefer to see some results in the abstract.

Respond: We have added additional results information to the Abstract section, highlighted in red and presented as follows:

The results show that Gaussian Process emerged as the most efficient solution compared to other optimization algorithms, providing the lowest Mean Square Error and achieving a prediction R2 of 0.998 for the training set, 0.972 for the validation set and 0.984 for the test set, while BO - Random Forest and Random Search performed as well on the training and test sets as Gaussian Process but significantly worse on the validation set, specifically R2 on the validation set of BO - Random Forest and Random Search were 0.970 and 0.969 respectively over the entire dataset including all cross-sectional shapes of the RC wall.

2- The introduction needs to be more emphasized on the research work with a detailed explanation of the whole process considering past, present and future scope. How the present study gives more accurate results than previous studies? It needs to be strengthened in terms of recent research in this area with possible research gaps. It is strongly recommended to add a recent literature.

Respond: We have added analyses related to the limitations of existing studies, research gaps, and some recently published literature, specifically highlighted in red in the introduction section:

3- Please avoid the basic details about the methodology in the introduction portion, the introduction portion, please use only the latest reference. Please reduce these sections.

Respond: We found the Abstract section to be quite concise and to highlight the main results. If this section were reduced, we fear that the reader would not find the research results interesting.

4- Please describe the important and novelty of the selected problem, data details. Please provide details about the selected problem. Please include the validation process on the unique problem.

Respond: We have added section 2. Research significance, the content highlighted in red in the revised manuscript as follows:

2. Research significance

To address the limitations of previous studies, this paper proposes the following approach to apply the ML model in predicting the shear strength of reinforced concrete walls:

1) The large dataset of RC walls collected and processed includes 1057 samples with three different cross-sectional shapes.

2) A detailed study was performed on the XGBoost model, with the parameter sets for the XGBoost model determined through three different optimization methods.

3) The role of input variables is evaluated using SHAP values for the XGBoost model, providing an explanation of the model's predictive ability.

4) The prediction ability of the XGBoost model is compared with standard design codes and existing benchmark models.

By implementing this method, the study is expected to overcome the limitations of previous studies and provide a more effective method to evaluate the shear capacity of RC walls.

5- In the conclusion section, the limitations of this study, suggested improvements of this work, and future directions should be added

The author needs to address the abovementioned points for the betterment of the manuscript.

Respond: We thank to editor’s comments. We have added limitations and solutions at the end of the conclusion section, as follows:

However, it should be noted that the XGboost model does not have extrapolation capabilities, so the model's accuracy is only guaranteed within the range of values it was trained on. This can be improved by using a more general training dataset and using machine learning models that are not limited in extrapolation capabilities.

Responses to Reviewer 1

Dear reviewer #1, we thank for your nice and constructive comments, which help us to improve the quality of our work. Especially comments on addressing the overfitting problem and limitations of the study.

1. Comment 1

The abstract should be concise, focusing on the methodology employed and the key findings. A clear and succinct summary will enhance the paper's readability and impact.

Respond: This is useful advice, we have tried to rewrite the abstract to clarify the research content and contributions in it.

2. Comment 2

Consider adding a section after the introduction titled "Research Significance" to emphasize the need for further research on this subject and highlight the novelty and innovation of your work. This will provide context and motivation for your study.

Respond: We appreciate this comment, it makes the manuscript much clearer. We have added section 2.

2. Research significance

To address the limitations of previous studies, this paper proposes the following approach to apply the ML model in predicting the shear strength of reinforced concrete walls:

1) The large dataset of RC walls collected and processed includes 1057 samples with three different cross-sectional shapes.

2) A detailed study was performed on the XGBoost model, with the parameter sets for the XGBoost model determined through three different optimization methods.

3) The role of input variables is evaluated using SHAP values for the XGBoost model, providing an explanation of the model's predictive ability.

4) The prediction ability of the XGBoost model is compared with standard design codes and existing benchmark models.

By implementing this method, the study is expected to overcome the limitations of previous studies and provide a more effective method to evaluate the shear capacity of RC walls.

3. Comment 3

In order to assess the reliability of soft computing models, researchers often use statistical indices such as MSE. The authors are encouraged to include a recently proposed performance index, the a20 index (see the last equation in Table 8 of https://doi.org/10.1016/j.ultras.2024.107347 and eq. 11 in s00521-021-06004-8).

Respond: We thank to reviewer’s comments. We found it necessary to add a performance indicator to make the results more objective. However, we have no experience working with the A20 indicator, so we decided to use the slightly more popular MAE (Mean Absolute Error) indicator.

Mean Absolute Error (MAE): MAE=1/N ∑_(j=1)^N▒| y_j-y_(t,j) | (4)

where yj is the actual shear strength of the jth sample in the dataset; y𝑡,𝑗 is the predicted shear strength of the jth sample obtained from the ML model; y ® is the mean value of the actual shear strength of the data set; N is the total number of samples in the dataset.

4. Comment 4

To bolster their results, the authors are urged to include in the database used for model training both the experimental ‘true’ values and those predicted by the optimal proposed model, along with corresponding publication details.

Respond: We will consider publishing the data in the article. The data collection method has been described in detail in section 3.1. However, any requests for data will be responded to by contacting Corresponding author.

5. Comment 5

Transitions between sections should be smoother to enhance readability..

Respond: We thank to reviewer’s comments. We have attempted to edit the content of the forwarding sections.

6. Comment 6

The conclusion section should be modified and improved.

Respond: We thank to reviewer’s comments. We found it very helpful. The conclusion has been rewritten as recommended.

7. Comment 7

The literature review presented in the manuscript is not sufficiently comprehensive. The authors are encouraged to refer to extensive state-of-the-art reports in soft computing techniques such as ‘Predicting the shear strength of reinforced concrete beams using Artificial Neural Networks,’ ‘Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete,’ ‘Predicting the thermal conductivity of soils using integrated approach of ANN and PSO with adaptive and time-varying acceleration coefficients,’ ‘Analysis and prediction of the effect of Nanosilica on the compressive strength of concrete with different mix proportions and specimen sizes using various numerical approaches,’ ‘Predicting the unconfined compressive strength of granite using only two non-destructive test indexes,’ ‘Genetic prediction of ICU hospitalization and mortality in COVID‐19 patients using artificial neural networks,’ and ‘Developing bearing capacity model for geogrid-reinforced stone columns improved soft clay utilizing MARS-EBS hybrid method.’. Detailed and in depth state-of-the-art report can be found in https://doi.org/10.1016/j.jobe.2023.108369

Respond: We thank to reviewer’s comments. We found these studies useful, so we have chosen to cite two relevant studies in the revised manuscript:

Recently, Machine Learning (ML) based models have demonstrated their effectiveness in forecasting the shear strength of various structural elements such as beams [11], [12], [13,14]

[13]. L. Cavaleri, M.S. Barkhordari, C.C. Repapis, D.J. Armaghani, D.V. Ulrikh, P.G. Asteris, Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete, Construction and Building Materials 359 (2022) 129504.

[14]. P.G. Asteris, D.J. Armaghani, G.D. Hatzigeorgiou, C.G. Karayannis, K. Pilakoutas, Predicting the shear strength of reinforced concrete beams using Artificial Neural Networks, Computers and Concrete, An International Journal 24 (2019) 469–488.

8. Comment 8

How do you address the potential overfitting of the hybrid model? A short paragraph about this crucial issue will also add more value in their work.

Respond: We found this very useful and have added content about solutions to avoid overfitting in section 4.1 of the revised manuscript.

It is important to note that to avoid overfitting during training and optimization, two techniques have been applied: (1) Subsample and (2) K-fold CV. In which, the Subsample technique uses a certain proportion of input variables during training, which helps create simpler trees and avoid overfitting. Meanwhile, the K-Fold technique is used on the training data set itself, allowing the model to be trained/validated during the optimization process, all single data fold in the training set is in turn fed into training/validation, leading to training results that avoid overfitting.

9. Comment 9

Adding a short section on the limitations of the proposed models, titled 'Limitations and Future Work,' will enhance the value of the submitted work.".

Respond: We thank to reviewer’s comments. We found it helpful But we found that adding a short paragraph was really unnecessary, so we added a small paragraph about the study's limitations and solutions at the end of the conclusion as follows:

However, it should be noted that the XGboost model does not have extrapolation capabilities, so the model's accuracy is only guaranteed within the range of input variable values it was trained on. This can be improved by using a more general training dataset and using machine learning models that are not limited in extrapolation capabilities.

10. Comment 10

A thorough proofreading is essential to address typos and language errors. Improving the manuscript's English usage will enhance its overall readability and professionalism.

Respond: We appreciate the reviewers' comments. We have tried our best to correct the manuscript.

Responses to Reviewer 2

Dear reviewer #1, we thank for your nice and constructive comments, which help us to improve the quality of our work.

1. Comment 1

Add correlation matrix of inputs.

Respond: We thank to reviewer’s comments. We found it very helpful and had added correlation matrix of input in Figure 2 in the revised manuscript and also adds some analysis of input variable correlations :

Figure 2. Correlation matrix of the features with data 1057samples

Figure 2 shows the correlation matrix of the data set, which includes 13 input variables and 1 output variable. The matrix displays the correlation coefficients between each pair of variables, where a correlation value of 1 represents a perfect positive correlation, -1 represents a perfect negative correlation, and 0 represents no correlation. The correlation matrix helps us understand the relationship between different variables and how they relate to each other. Initial analysis shows that there are both positive and negative correlations between variables and that pairs of highly correlated attributes are more interdependent. Specifically, the highest correlation coefficient is 0.89 between the two variables X8 and X10, demonstrating a close relationship between these characteristics. Additionally, geometrical parameters and loads applied to the wall have the highest correlation with output performance. Understanding the correlation matrix can help determine which features are important to the resulting characterization and which features are redundant, useful for further analysis and modeling.

2. Comment 2

Add Ref for equation 2 and 3.

Respond: We agreed with reviewer’s comments. We found it right to had added reference for eqution 2, 3 as follow:

To evaluate the performance of the established models, statistical parameters, including Correlation coefficient (R2) [41], Root Mean Square Error (RMSE) [42], and mean absolute error (MAE) [42],

3. Comment 3

Add criteria for stopping optimization

Respond: We found reviewer’s comments is very helpful. We have added the optimal stopping condition into the manuscript as follows (line 222-223):

The optimization process will stop after the algorithm has performed at least 100 iterations, without the optimal result changing.

4. Comment 4

Add some ref to support the results of the section 4:

The analysis results show clearly, the flange length (X5) and wall length (X2) has the

greatest influence on the shear strength value of RC walls. This proves that, in addition to

the wall body, the flange also significantly participates in the wall's shear capacity.

Respond: We thank to reviewer’s comments. We found the input variable impact analysis part to be a bit lacking, so we rewrote it as follows:

Based on the results, it can be inferred that the flange length (X5) and wall length (X2) are the most important characteristics affecting the shear strength of reinforced concrete walls. More specifically, when the flanged length value (X5) increases to the maximum value of this variable (redpoint), the corresponding Shap value increases in the positive direction to more than 500. This shows that the shear strength of the wall increases significantly in proportion to the flanged length. Meanwhile, when the wall length value (X2) increases, the maximum Shap value of this variable reaches about 1200, showing that the impact of this variable on the shear strength of the wall is even greater, that is, the longer the wall length, the higher the shear strength value.

5. Comment 15

cite recent related paper like what follows to enrich introduction and compare your paper with them and the gap you cover:

"Ensemble techniques and hybrid intelligence algorithms for shear strength prediction of squat reinforced concrete walls." 1, 8(1), 37. https://doi.org/10.12989/acd.2023.8.1.039

"Failure mode detection of reinforced concrete shear walls using ensemble deep neural networks." International Journal of Concrete Structures and Materials 16, no. 1 (2022): 33. https://doi.org/10.1186/s40069-022-00522-y

Response estimation of reinforced concrete shear walls using artificial neural network and simulated annealing algorithm. In Structures (Vol. 34, pp. 1155-1168). Elsevier.https://www.sciencedirect.com/science/article/pii/S2352012421007657

Respond: We thank to reviewer’s comments. We found these studies to be very meaningful, so we have cited two studies in the Introduction, namely:

[19] M.S. Barkhordari, L.M. Massone, Failure mode detection of reinforced concrete shear walls using ensemble deep neural networks, International Journal of Concrete Structures and Materials 16 (2022) 33.

[25] M.S. Barkhordari, L.M. Massone, Ensemble techniques

Attachment

Submitted filename: Response to Reviewers.docx

pone.0312531.s006.docx (260.4KB, docx)

Decision Letter 1

Afaq Ahmad

9 Oct 2024

The research explores the predictive capacity of the shear strength of reinforced concrete walls with different cross-sectional shapes using the XGBoost model

PONE-D-24-29804R1

Dear Dr. Pham,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Afaq Ahmad, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The authors have adeptly addressed the constructive comments and suggestions provided during the review process, showcasing their commitment to enhancing the quality and rigor of their manuscript. Their thoughtful revisions and meticulous attention to detail have resulted in a strengthened and more robust final version of the paper. As such, it is recommended that the manuscript be accepted for publication, as it contributes significantly to the existing body of knowledge in the field and reflects the dedication of the authors to scholarly excellence.

Reviewers' comments:

Acceptance letter

Afaq Ahmad

14 Oct 2024

PONE-D-24-29804R1

PLOS ONE

Dear Dr. Pham,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr Afaq Ahmad

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data. Data from Fig 3.

    (CSV)

    pone.0312531.s001.csv (12.4KB, csv)
    S2 Data. Data from Fig 4.

    (CSV)

    pone.0312531.s002.csv (470B, csv)
    S3 Data. Data from Fig 7.

    (CSV)

    pone.0312531.s003.csv (50.1KB, csv)
    S4 Data. Data from Fig 8.

    (CSV)

    pone.0312531.s004.csv (64.8KB, csv)
    S1 Table. Example of data used in Table 1.

    (CSV)

    pone.0312531.s005.csv (30.9KB, csv)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0312531.s006.docx (260.4KB, docx)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES