Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Jun 25;14:14590. doi: 10.1038/s41598-024-64386-w

Prediction and reliability analysis of shear strength of RC deep beams

Khaled Megahed 1,
PMCID: PMC11199521  PMID: 38918511

Abstract

This study explores machine learning (ML) capabilities for predicting the shear strength of reinforced concrete deep beams (RCDBs). For this purpose, eight typical machine-learning models, i.e., symbolic regression (SR), XGBoost (XGB), CatBoost (CATB), random forest (RF), LightGBM, support vector regression (SVR), artificial neural networks (ANN), and Gaussian process regression (GPR) models, are selected and compared based on a database of 840 samples with 14 input features. The hyperparameter tuning of the introduced ML models is performed using the Bayesian optimization (BO) technique. The comparison results show that the CatBoost model is the most reliable and accurate ML model (R2 = 0.997 and 0.947 in the training and testing sets, respectively). In addition, simple and practical design expressions for RCDBs have been proposed based on the SR model with a physical meaning and acceptable accuracy (an average prediction-to-test ratio of 0.935 and a standard deviation of 0.198). Meanwhile, the shear strength predicted by ML models was then compared with classical mechanics-driven shear models, including two prominent practice codes (i.e., ACI318, EC2) and two previous mechanical models, which indicated that the ML approach is highly reliable and accurate over conventional methods. In addition, a reliability-based design was conducted on two ML models, and their reliability results were compared with those of two code standards. The findings revealed that the ML models demonstrate higher reliability compared to code standards.

Keywords: Deep beams, Symbolic regression (SR), Support vector regression (SVR), XGBoost (XGB), CatBoost (CATB), Random forest (RF), Gaussian process regression (GPR), Artificial neural networks (ANN), Bayesian optimization (BO) technique, Reliability-based design

Subject terms: Civil engineering, Statistics, Scientific data, Computer science

Introduction

Reinforced concrete deep beams (RCDBs), characterised by a small span-to-height ratio (typically below 2.5)13, are commonly employed in various structures such as lower floors, transfer girders, and pile caps due to higher shear strength compared to slender beams. Despite their widespread application, the design of RCDBs poses challenges due to the nonlinear impact of different parameters on their shear behaviour. The primary failure mode of RCDBs is shear stress, often resulting in sudden and catastrophic collapses, introducing significant safety risks. Various shear strength models for RCDBs have been investigated, including those employing machine learning methods211, the strut-and-tie model1214, the compression field method15, and finite element analysis16. However, traditional design methods, such as the strut-and-tie model (STM) or mechanism analysis, often fail to adequately capture the complex relationship between parameters affecting shear strength, leading to imprecise and conservative results compared to test results. Furthermore, the design provisions available, e.g., ACI 31817, and EC218, and different models13,14, provide simple procedures for calculating the shear capacity of RCDBs but their conservative nature and their discrepancy with test results fail in introducing a comprehensive model that can approximate the shear capacity of RCDBs accurately.

In recent developments, new models have been proposed to enhance the prediction of shear capacity in deep beams. Chen et al.19 introduced the cracking STM model, which integrates the STM approach with considerations of diagonal crack patterns and strain distributions in horizontal reinforcement. Meanwhile, Chetchotisak et al.20 presented a modified interactive STM for RCDBs, relying on two distinct load-bearing mechanisms: the inclined strut and the truss. This model refines the strut mechanism from the interactive STM and incorporates empirical constants into the Mohr–Coulomb failure criterion to define a new concrete failure mode. Fan et al.21 proposed an STM for unsymmetrically loaded RC deep beams, where the geometry of the compression nodal zones is determined using Mohr's Circle and the minimum strain energy criteria. Despite aligning well with experimental data, these models require tedious calculations.

Machine learning (ML) has become a promising tool in many engineering aspects, providing an alternative procedure for addressing engineering challenges. ML algorithms, including support vector machines, artificial neural networks, genetic algorithms, and ensemble learning methods, have been extensively used in predicting the shear strength of RCDBs211. For example, Ma et al.2 implemented six ML models to predict the shear strength of RCDBs and compared their performance with five previous closed-form models. Recently, Nguyen et al.3 implemented seven machine learning models for predicting the shear strength of RCDBs and found that Gaussian process regression (GPR) is the most reliable and accurate ML model. Feng et al.6 studied four typical ensemble learning models, including random forests, gradient boosting regression tree, adoptive boosting and extreme gradient boosting (XGBoost), to predict the shear capacity of RCDBs using a dataset of 271 samples and grid search method for hyper-parameters tunning. The comparison results of these models showed that the XGBoost model is the best model concerning prediction accuracy (R2 = 0.992 and 0.917 in the training and testing sets, respectively). However, the metric errors in the testing set are nearly 3–8 times those in the training set, indicating signs of overfitting. Recently, Tiwari et al.7 used eight ML models for the shear capacity of RCDBs and found that the XGBoost model exhibited the highest accuracy. Ashour et al.4 used genetic expression programming to develop an empirical expression for the shear strength of RCDBs using 141 test data. Shahnewaz et al.9 and Wakjira10 used a genetic algorithm to predict the shear strength of RCDBs. Liang et al.22 devoloped a symbolic regression (SR) model based on the Modified Compression Field Theory to analyze the punching shear resistance of fiber-reinforced polymer (FRP) reinforced concrete slabs.

From literature review, it was found that limited researchers 23,24 have examined the safety of RC deep beams designed according to the code design practice. Aguilar et al.23 evaluated the reliability of deep beams designed using the strut-and-tie method according to ACI 318. They found that ACI 318 design practice increases the likelihood of nonductile failure and suggested reliability-based strength reduction factors of 0.65 for struts and 0.90 for ties. Muendacha et al.24 conducted a safety-based evaluation of shear design methods for RC deep beams using strut-and-tie models (STMs) in accordance with international concrete codes, considering variability in load actions and member resistances as random variables. Their findings indicated that deep beams made from normal-strength concrete and designed using these STMs provided a satisfactory safety level, and they suggested probability-based reduction factors to achieve a target reliability index greater than 3.5. Regarding the integration of reliability analysis with machine learning, Shen et al.25 combined reliability analysis with machine learning by using Monte Carlo simulation alongside a machine learning-based surrogate model to calibrate the reliability of slab-column joints for punching shear resistance.

It can be concluded from these studies that ML can be used successfully to predict the shear strength of RCDBs accurately. However, most models depend on primitive search algorithms, such as grid search techniques for tuning ML parameters, lacking sophistication in refining the ML models. Moreover, most recent studies lack a real-world practical application and fail to highlight the gap between the theory and practical implementation. While many ML models exhibit superior results, deriving an explicit design formula from these models is challenging. The black-box and difficult-to-interpret nature of these models hinders their practical implementation in engineering design. Moreover, previously introduced ML studies primarily focus on prediction outcomes and accuracy without engaging in reliability-based design to bridge the gap between ML and practical engineering applications. Furthermore, many studies develop separate models for specific beam cases, such as those with or without web reinforcements2,10. This approach not only lacks generalisation but also introduces fluctuations in the results. In addition, most expressions introduced through ML techniques, i.e., genetic expression programming (GEP) and genetic algorithm (GA), lack clear interpretation, lack physical meaning, and are overly complex4,9,10. Table 1 provides an overview of ML models and previous formulas employed in previous studies, as well as their associated results.

Table 1.

Summary of previous ML models in predicting RCDBs shear strength.

Reference Category (number)* Models: Statistical criteria
Wakjira10 WOR (371) GP: Vth=0.0456fc0.619ρl0.411ad-0.874bwd μ = 0.82, COV = 0.305
Ashour4 All (141) GP: V=bwhfc-4.56+1.68adρl2+2.45+0.1ad2-1.16ad+3.12ρtρl+0.3ρhw+0.4ρvw μ = 1.11, Std = 0.21
Shahnewaz9 All (381)

GA: Vu=bwhfc25-14ad0.23+0.85ρlρhwρvw0.1-35adρhwρvw116-200adρlρhwρvw2.65 μ = 0.99, CoV = 0.232

GA: Vu=bwhfc1.74-2ad0.044+0.5ρ0.14 μ = 1.01, CoV = 0.257

Cheng5 All (106)

EMARS, BPNN, RBFNN, SVM. EMARS is the best model. Grid search with cross-validation

EMARS: training MAPE = 5.67, R2 = 0.989, testing MAPE = 5.887, R2 = 0.973

Feng6 All (271)

DT, SVM, ANN, RF, AdaBoost, GBRT, XGBoost. XGboost is the best model. Grid search with cross-validation

XGboost: Training R2 = 0.999, MAPE = 0.74, testing R2 = 0.928, MAPE = 10.44% (overfitting)

Hameed11 All (271)

LWR, RF, MLR, ELM. LWR is the best model. Grid search with cross-validation

LWR: Training: RMSE = 22.563, MAE = 13.249, a20-index = 98.89, testing: RMSE = 57.776, MAE = 33.933, a20-index = 85.87

Liu8 All (267)

LR, SVR, ANN, RF, XGBoost, NGBoost using Bayesian optimisation technique. NGBoost is the best model

NGboost: R2 = 0.9045, RMSE = 38.7976 kN

Tiwari7 All (271)

DT, SVR, RF, GB, Adaptive boosting, XGBoost, voting regression. XGboost is the best model. Grid search with cross-validation

XGboost: Training R2 = 0.999, MAPE = 0.78, RMSE = 1.45 kN, testing R2 = 0.928, MAPE = 9.79, RMSE = 47.76 (overfitting). μ = 1.00, CoV = 6.38%

Nguyen3 All (518)

LR, ANN, SVR, DT, GPR, XGBoost using Bayesian optimisation technique. GPR is the best model

GPR: Training R2 = 0.99, MAE = 12.77, RMSE = 18.84 kN, validation R2 = 0.89, MAE = 41.72, RMSE = 71.06 kN, testing R2 = 0.94, MAE = 38.44, RMSE = 63.38

Ma2 All (457)

kNN, DTM RFM GBDT, CatBoost, XGboost. XGboost is the best model. Grid search with cross-validation

XGboost: Training R2 = 0.992, MAE = 0.148, RMSE = 0.26, testing R2 = 0.917, MAE = 0.531, RMSE = 0.777

WOR*: μ = 1.03, Std = 0.128, WVR*: μ = 1.005, Std = 0.073, WHR*: μ = 1.003, Std = 0.077, WVHR*: μ = 1.01, Std = 0.084

This study All (840)

CATBoost: Training: μ = 1.005, CoV = 0.062, a20-index = 0.9894, MAPE = 4.41, RMSE = 36.8 kN

Testing: μ = 1.026, CoV = 0.141, a20-index = 0.899, MAPE = 9.32, RMSE = 160.9 kN

SR: (WOR)* μ = 1.003, CoV = 0.207, a20-index = 0.68, MAPE = 16.80, RMSE = 115.9 kN

SR: (WWR)* μ = 1.004, CoV = 0.192, a20-index = 0.78, MAPE = 13.70, RMSE = 196.7 kN

*WOR and WWR stand for without web reinforcement and with web reinforcement cases.

The present study introduces novel contributions in several key aspects. Firstly, it develops unified ML-based models for RCDBs shear strength, combining both beam cases, i.e., with and without web reinforcements, in a unique predictive model. while many previous studies focused on predicting each type independently2,10. Furthermore, the ML results are compared with those of mechanics-driven models, including two prominent design codes (American code (ACI318)17 and European code (EC2)18) and two previous mechanic-based models13,14 to validate the performance of the developed ML models. Secondly, the Bayesian optimization (BO) technique is adopted for selecting the optimal hyperparameters for the introduced ML models. This approach differs from the conventional and less advanced searching techniques commonly found in literature, such as the grid search technique. Thirdly, simple and practical design expressions for RCDBs have been proposed based on the symbolic regression (SR) model. These expressions are simple and easy to interpret and demonstrate remarkable accuracy compared to previous closed-form models. Finally, a reliability-based design assessment is conducted on two different ML models and two code standards to evaluate the reliability of utilising ML models in practical design applications.

Experimental database of RC deep beams

The schematic diagram of the shear mechanism of RCDBs is shown in Fig. 1. To construct robust ML models and investigate their influencing parameters, a dataset comprising 840 RCDB tests was collected in existing literature and from a database collected by Chetchotisak et al.20, including 322 specimens without web reinforcement (WOR) and 518 specimens with web reinforcement (WWR). The details of the collected database are provided in Supplementary data. Based on the results of various experimental and theoretical studies1214,26, the shear capacity of RCDBs is influenced by different shear components, which typically encompass the strength of concrete material, longitudinal rebars and web reinforcement. Therefore, 14 different design features were set as the input variables, grouped into five categories26: (1) geometric dimensions: beam height (h), effective height (d), width (bw), shear span (a) and shear span-to-depth ratio (a/d); (2) concrete property, i.e., concrete strength (fc′); (3) bottom longitudinal reinforcement properties: reinforcement ratio (ρl), and strength (fyl); (4) web reinforcement properties: vertical web reinforcement (VWR) ratio (ρv) and strength (fyv), horizontal web reinforcement (HWR) ratio (ρh) and strength (fyh); (5) top plate width (wtp) and bottom plate width (wbp). The corresponding output is the shear strength index of the RCDBs (Vu/bwh fc′), denoted by vn, where Vu is the web shear capacity. Table 2 summarises statistical information for the output and 14 input features within the established database.

Figure 1.

Figure 1

The dimensions of RC deep beam.

Table 2.

Statistic features of the experimental dataset.

Variable Symbol Type Statistics
Min Max Mean Std Skewness Kurtosis
Beam height h(mm) Input 152 2100 564 291 2.076 5.407
Beam effective height d(mm) Input 137 2000 499 271 2.16 5.962
Beam width bw(mm) Input 51 914 196 120 2.601 9.865
Shear span a(mm) Input 80 4375 637 465 2.629 9.976
Shear span-to-depth ratio a/d Input 0.27 2.502 1.304 0.541 0.349 − 0.499
Concrete strength fc(MPa) Input 11.3 120.1 40.2 21.6 1.122 0.698
Bottom reinforcement ratio ρl Input 0.003 0.113 0.02 0.011 1.704 7.3
Bottom reinforcement strength fyl(MPa) Input 267 1330 470 131 2.828 13.993
Vertical web reinforcement ratio ρv Input 0 0.029 0.003 0.004 2.493 9.767
Vertical web reinforcement strength fyv(MPa) Input 0 1051 273 227 0.103 − 0.627
Horizontal web reinforcement ratio ρh Input 0 0.032 0.002 0.003 3.553 17.956
Horizontal web reinforcement strength fyh(MPa) Input 0 855 206 230 0.426 − 1.329
Top plate width wtp(mm) Input 10 914 146 107 3.094 12.72
Bottom plate width wbp(mm) Input 10 610 136 82 2.534 8.657
Shear strength index vn=Vubwhfc Output 0.011 0.293 0.134 0.053 0.384 − 0.287

The Pearson correlation coefficient (r) is used in this study to assess the strength of the linear correlation between any two features27. Spanning from − 1.0 to 1.0, a value of − 1.0 indicates a strong negative relationship, 1.0 signifies a strong positive relationship, and 0 denotes no correlation. As illustrated in Fig. 2, the Pearson correlation matrix displays that the relationship between most input features is insignificant. However, a relatively high degree of correlation is observed between the VWR/HWR ratios and VWR/HWR strengths and between the widths of the upper and lower bearing plates. The former correlation is attributed to the presence of 322 specimens without reinforcement (ρv = fyv = ρh = fyh = 0) out of the total 840, leading to a pseudo correlation effect. While the latter correlation between the widths of the upper and lower bearing plates arises from the fact that a significant portion of the tests were conducted with identical plate widths. Among all the input variables, the ratio a/d, concrete strength fc′, HWR ratio ρh, and VWR ratio ρv appear to have the most significant impact on the shear strength index (Vu/bwh fc′), with correlation values of − 0.91, − 0.39, 0.24, and 0.22, respectively. These findings imply that increasing the ratio a/d will significantly reduce the shear strength index. Similarly, increasing concrete strength triggers the brittle failure of the beam, leading to a reduction in the strength index, while increasing VWR/HWR ratios enhances the ductility of the RCDBs. These observations align well with the mechanical behaviour and experimental results of RCDBs1214,26.

Figure 2.

Figure 2

Correlation matrix for the RC deep beams database.

Research significance

This study presents novel contributions in multiple domains: Firstly, it introduces unified machine learning models for predicting shear strength in Reinforced Concrete Deep Beams (RCDBs). Additionally, the study employs Bayesian Optimization for hyperparameter tuning. Simple and practical design expressions based on symbolic regression are proposed, demonstrating remarkable accuracy compared to previous mechanism models. In addition, a reliability-based design assessment evaluates the reliability of using machine learning models in practical design applications.

ML algorithms

In this study, eight typical ML models are selected to predict the shear strength of RCDBs, including symbolic regression (SR)28,29, Gaussian process (GPR)30, artificial neural network (ANN), light gradient-boosting machine (LightGBM)31, random forests (RF)32, categorical boosting (CatBoost)33, extreme gradient boosting (XGBoost)34, and Support vector regression (SVR)35. The predictive performances of these models are then evaluated and compared. In general, ensemble learning tends to exhibit higher accuracy and stability compared to individual models2,68.

Random forests, proposed by Breiman32, falls under the category of ensemble learning based on bagging, which utilises bagging sampling to create a subset for training weak learners (such as decision trees) and makes decisions on regression or classification tasks through averaging or voting. Several crucial parameters, including the number of trees, the maximum number of features, and the maximum depth of trees, significantly impact the training results. On the other hand, CatBoost, LightGBM, and XGBoost are all part of ensemble learning based on boosting, which combines weak learners into a strong one through an iterative process36. CatBoost excels in handling categorical features, eliminating the need for preprocessing non-numerical features33. It solves the problem of gradient bias and enhances the generalization ability by employing unbiased boosting techniques with categorical features. LightGBM31 uses a histogram-based approach for splitting, while XGBoost27 utilises a level-wise depth-first approach, which results in faster training times and better handling of large databases with LightGBM compared to XGBoost. In subsequent sections, this paper will introduce two innovative ML models, including CatBoost and symbolic regression models.

CatBoost model

CatBoost is a gradient boosting algorithm33,37, which differs from other gradient boosting algorithms in its use of ordered boosting, an efficient modification of gradient boosting algorithms. This modification can handle the problem of target leakage and can reduce prediction shift during training33. It is beneficial for small datasets, and it can handle categorical features. Specifically, the original variable is replaced with a new binary feature for each category. Another advantage of CatBoost is its use of random permutations in estimating leaf values during the selection of the tree structure33. This strategy helps overcome overfitting issues commonly associated with traditional gradient-boosting algorithms. Furthermore, CatBoost utilises binary decision trees as the foundational predictor.

As described by Dorogush et al.33, CatBoost can be outlined as follows: Let Ti represent the model built after constructing first i trees, giXk,Yk denote the gradient value on k-th training sample after constructing i trees. To ensure an unbiased gradient concerning the model Ti, it is essential to train Ti without the observation Xk. The standard training process appears impossible without observations since unbiased gradients are required for all training examples. The following trick is considered to handle this problem: for each example, Xk, a separate model Mk is trained and never updated using a gradient estimate for that specific example. With Mk, the gradient on Xk is estimated and used to score the resulting tree. Let us present the flowchart shown in Fig. 3a that explains how this trick can be performed. Let Loss(y, a) be the optimising loss function, where y is the label value and a is the formula value.

Figure 3.

Figure 3

Flow charts of the introduced ML models. (a) CatBoost, (b) Symbolic regression.

Symbolic regression and proposed equations

Symbolic regression (SR)28,29 is a genetic programming technique38 which seeks to search simple and interpretable analytic formulas providing the best fit for a given model by exploring a predefined space of mathematical expressions and functions. SR are treated as multi-objective optimisation problems, finding a balance between the model's predictive accuracy and complexity. The genetic programming techniques are often utilised in SR by applying natural selection and evolution principles to iteratively refine candidate mathematical expressions until satisfactory models are obtained. This paper uses a Python library named PySR32 to search interpretable simple expressions for the shear capacity of the RCDBs.

The SR algorithm initiates by constructing an initial population with a random combination of operational symbols or functions (e.g., +, −, /, *, ^, etc.) and terminals, including input variables and constants. This process generates a tree-like expression for each individual in the population. Individuals are probabilistically selected, giving preference to the best-performing ones. The selected individuals undergo mutation (Fig. 4a,b) or crossover (Fig. 4c) to produce a new generation of populations, using a fitness function to identify the best individuals in each population generation is defined as39

lE=lpredE.expfrecencyCE 1

where lpredE is the prediction loss (selected as the mean absolute error), CE is the complexity of the expression E, (the total number of nodes in the expression), and frecency [C(E)] measures the frequency and recency of the expression occurring at complexity C(E) in the population. This measure is employed to prevent excessive growth and redundancies in expressions generated, balancing error minimisation and simplicity. Table 3 outlines the SR parameters used in expression generation. The core steps of SR are presented in Fig. 3b.

Figure 4.

Figure 4

Mutation and crossover operations in SR model. (a) A mutation operation on expression tree, (b) a mutation operation on input variable, (c) a crossover operation between two trees.

Table 3.

The parameters of the SR model used in generating expressions.

Parameters Value Parameters Value
Number of generations 200 Allowed Binary operators +, *, ^, /
Total number of populations 50 Loss function Mean absolute error
Population size 20 Constraints {‘^’:(–1,10)}(a)
Maximum length of expressions (total number of nodes) 30 (WWR), 22 (WOR) Nested constraints ‘^’:{‘^’:0,’/’:1}, ‘/’:{‘/’:0,’^’:1}(b)
Parsimony (factor control the expression complexity) 0.02 Model_selection Accuracy

(a) The constraint ‘^’:(− 1, 10) says that power laws can have any complexity in the left argument, but only 10 complexity (nodes) in the right argument.

(b) The nested constraints specify how many times a combination of operators can be nested. The constraint ‘/’:{‘/’:0,‘^’:1} indicates that ‘/’ may never appear within ‘/’, but ‘^’ can be nested once in ‘/’.

Selecting the optimal expression requires numerous iterations and a thorough investigation for each iteration. These iterations encompass trying various custom functions, a diverse set of operators, and extensive combinations of input features, which could potentially affect the shear strength of RCDBs40. The parameter selection process includes the most significant features identified from the Pearson correlation matrix, such as span-to-depth ratio (a/d), concrete strength (fc′), and reinforcement ratios (ρl, ρv, ρh). Additionally, parameters from previous equations are considered, as outlined in Table 6, such as web reinforcement contribution (ρvfyv, ρhfyh) and the angle between the strut and the longitudinal axis (θ). The author also introduced some unitless parameters, including vertical and horizontal web reinforcement contribution factors (ρvfyv/fc′, ρhfyh/fc′) and the shear strength index (Vu/bwh fc′). The SR algorithm generates different expressions for each iteration using various combinations of these parameters. Each resulting equation extracted with each iteration undergoes exhaustive evaluation and refinement. The selection criteria carefully weigh multiple factors, including equation complexity, accuracy, and interpretability. For RCDBs without web shear reinforcement, the following equation is derived:

Vnbwh=1.5fc0.85-0.22adρl0.29-adρl,ad2.5,ρl0.1 2

Table 6.

Summary of previous mechanical models in predicting RCDBs shear strength. MW stands for Matamoros and Wong's formula.

Code Standard Formulas
ACI31817 VACI=0.85βsfcbwwssinθ,ws=1.8wtcosθ+wtp+wbpsinθ/2
EC218 Vu,EU=0.85βsfcbwwssinθ,ws=[1.85wtcosθ+wtp+wbpsinθ]/2
Matamoros and Wong13 VM-W=Ccfcbwws+Cwvρvbwa3fyv+Cwhρhbwd3fyh, Cc=0.3/ad, Cwv=1,Cwh=3(1-a/d)
Russo et al.14 VRusso=0.76kχfcsinθ+0.35adρvfyv+0.25tanθρhfyhbwd,χ=0.74r3-1.28r2+0.22r+0.87,r=fc/105
Current study

Without web reinf.: Vnbwh=1.5fc0.85-0.22adρl0.29-adρl,ad2.5,ρl0.1

With web reinf.: Vnbwh=29ρl+3.8ρl0.3fc0.3Ψvhad+0.47,Ψvh=ρhfyhfc+ρvfyvfc

where βs is coefficient of strut, θ is the angle between the strut and the longitudinal axis, ws and wt are the widths of the strut and tie, εs is the tie's tensile strain, ρv is reinforcement ratio for VWR, and ρh is reinforcement ratio for HWR; the χ function is obtained for 10 ≤ fc′ ≤ 105 MPa.

For RCDBs with web shear reinforcement, the following equation is extracted:

Vnbwh=29ρl+3.8ρl0.3fc0.3Ψvhad+0.47,Ψvh=ρhfyhfc+ρvfyvfc 3

where Ψvh represents the web shear reinforcement contribution factor. The proposed equations establish a comprehensive and simple framework for predicting the shear strength of RCDBs with meaningful physical interpretations. In the context of the RCDBs without web shear reinforcement in Eq. (2), it is evident that increasing the longitudinal reinforcement ratio ρl or the concrete compressive strength will enhance shear capacity while increasing the shear span-to-depth ratio will weaken the beam shear strength. Notably, these findings align well with the conclusions drawn in the study of Ashour's study4, which identified the a/d ratio and ρl as the most significant parameters influencing shear behavior. Concerning RCDBs with web shear reinforcement in Eq. (3), the shear strength of RCDBs increase with increasing concrete strength, longitudinal reinforcement ratio ρl, web shear reinforcement contribution Ψvh and decreasing a/d ratio. These observations align well with the mechanical behaviour and experimental results of RCDBs1214,26,41. Furthermore, the developed expressions are simple, robust, and have physical meaning compared to that of GEP and GA models introduced in the previous studies in Table 1.

Data preprocessing and hyperparameter Bayesian optimisation technique

In this study, the min–max scaling technique is utilised for data normalisation to mitigate the adverse effects of multidimensionality. Following normalisation, the datasets are partitioned into two subsets for training and testing. Eighty percent of the original dataset is randomly allocated for training, while the remaining 20% is reserved for testing.

The performance of most ML algorithms relies heavily on their hyperparameters, which are predefined before model training. Properly tuning these hyperparameters is essential to ensure optimal prediction performance. Finding the best hyperparameters requires trying various sets of hyperparameters and selecting the parameter combination that yields the best performance with the validation data. Traditional techniques such as grid search (GS) and random search (RS) can be exhaustive and time-consuming, especially for models with various hyperparameters and large search space. In contrast, Bayesian optimization (BO) models utilise surrogate functions, i.e., Gaussian processes and tree-structured Parzen estimators (TPE)34, which guide the next selection of the hyperparameter combination depending on the performance of the previous history of tested hyperparameter values. This strategy minimises redundant evaluations, enabling BO to reach the optimal hyperparameter combination in fewer iterations compared to GS and RS methods42. This study adopted the TPE model34 to optimise the introduced ML models due to its superior robustness compared to other surrogate functions42. Mean Absolute Percentage Error, MAPE is chosen as the objective function in the validation dataset. The expected improvement (EI) of TPE, defined in Eq. (4), builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate in the true objective function43:

EIsz=constant w.r.tzγ+1-γgzlz 4

where z is the hyperparameter combination chosen from the search space and s* is a threshold chosen to be some quantile γ of the observed s values, so that ps<s=γ. Additionally, lz and gz correspond to two distinct distributions: one where the objective function values are below the threshold, l(z), and another where the values exceed the threshold, g(z). To maximize EI, TPE focuses on drawing samples of hyperparameters with the maximum l(z)/g(z) ratios from Eq. (4). Finally, cross-validation was applied to assess the introduced models' effectiveness, avoid overfitting, and obtain accurate predictions for the testing data. Table 4 presents the optimal hyperparameters for the introduced ML models.

Table 4.

The optimal hyperparameters for ML models.

ML model Optimal hyperparameters
CatBoost iterations = 1696, learning_rate = 0.0906, depth = 4, subsample = 0.389, colsample_bylevel = 0.784, min_data_in_leaf = 10
GPR Kernel: Constant*RBF + Constant*Matern + Constant*WhiteKernel + Constant* RationalQuadratic, gpr.alpha = 0.002
LightGBM n_estimators = 1972, learning_rate = 0.0329, max_depth = 50, num_leaves = 10, boosting_type = Gradient Boosting Decision Tree
XGBoost n_estimators = 1968, max_depth = 41, learning_rate = 0.0611, booster = ‘dart’, gamma = 0.01
RandomForest random_state = 1000, n_estimators = 1134, max_depth = 22, min_samples_leaf = 2, max_features = ‘log2’, bootstrap = False
ANN number of hidden layers = 1, neurons number of hidden layer = 12
SVR Log(C) = 0.9988, log(epsilon) = − 74.88644548, log(gamma) = − 1.72300541

Performance and results of ML models

In this section, a comparison of the performance of the developed ML models is made. The details of established ML models are provided in Supplementary data, including hyperparameter tuning and results. In Fig. 5, the scatter plots depict the relationship between experimental and predicted results across different ML models. As noticed, the data points cluster closely around the diagonal line for most of the developed ML models, indicating a strong alignment between model expectations and test results. This alignment emphasises the reliability and prediction accuracy achieved by the developed models. Table 5 highlights evolution metrics used to study the performance of the implemented models, i.e., coefficient of determination (R2), the mean (μ), coefficient of variance (CoV), mean absolute percentage error (MAPE), root mean squared error (RMSE), and a20-index, defined as follows:

R2=1-i=1ny^i-yi2i=1nyi-y¯2,μ=1ni=1ny^iyi,MAPE=100%ni=1ny^iyi-1,RMSE=1ni=1ny^i-yi2 5

where y^i and yi are the predictions and actual output values of the i-th specimen, respectively, y¯ is the mean value of actual observations, and n is the number of samples in the database. The a20-index44 introduces the ratio of specimens y^i/yi ratio within the interval of 0.80–1.20.

Figure 5.

Figure 5

Comparison between proposed equations and ML models for training and testing datasets.

Table 5.

Comparison of the developed ML models.

Metrics Training data Testing data All data
CatB GPR LGBM CatB GPR LGBM CatB GPR LGBM Prop.Eqn (WOR) Prop.Eqn (WWR)
Mean μ 1.005 1.006 1.008 1.026 1.023 1.026 1.01 1.01 1.012 1.003 1.004
CoV 0.062 0.062 0.076 0.141 0.151 0.154 0.085 0.088 0.098 0.207 0.192
R2 0.997 0.997 0.991 0.933 0.947 0.875 0.986 0.988 0.971 0.917 0.937
MAPE 4.41 4.44 5.32 9.32 10.27 9.61 5.39 5.60 6.18 16.80 13.70
RMSE( kN) 36.8 39.6 63.6 160.9 143.3 219.9 79.1 73.2 113.6 115.9 196.7
a20-index 0.994 0.993 0.981 0.899 0.857 0.887 0.975 0.965 0.962 0.68 0.78

As shown in Table 6, all introduced ML models display mean μ, R2, and a20-index values close to 1.0 and small values for CoV, MAPE, and RMSE. The MAPE values for the CATB model are 4.41 and 9.32 in the training and testing sets, respectively, which reach the lowest values compared to other models. Similarly, those of the GPR model are 4.94 and 10.27, and those of the LGBM model are 5.32 and 9.61, indicating the high accuracy of the developed models. The CoV and MAPE for all ML models are nearly twice as high for the testing data compared to the training data, indicating consistent training with minimal overfitting tendencies. Furthermore, the μ values of the CATB model are 1.005 and 1.026, the R2 values are 0.997 and 0.933, and the a20-index values are 0.994 and 0.899 in the training and testing sets, respectively, which are all close to 1.00. Such evaluation metrics reveal that the CATB model introduces the best prediction accuracy and predictive balance between the training and testing sets.

While CATB, GPR, and LGBM models exhibit superior results, deriving an explicit design formula from these models is challenging. The black-box and difficult-to-interpret nature of these models hinders their practical implementation in engineering design. Therefore, this study tackles this challenge by introducing straightforward and practical explicit design formulas through the SR technique. As shown in Table 6, the proposed equations yield μ values of 1.003 and 1.004, R2 values of 0.917 and 0.937, and CoV values of 0.207 and 0.192 for the RCDBs without web reinforcement (WOR) and with web reinforcement (WWR) cases, respectively. Despite their slightly lower accuracy compared to the introduced ML models, these SR-derived formulas are more accessible and easier to interpret, encouraging their practical utility in engineering applications.

Comparisons with closed-form models

In this section, a comparison of the proposed equations with four present closed-form models (listed in Table 6), including two standard codes, i.e., ACI 318-1917, EC218, and equations proposed by Matamoros and Wong (MW)13, and Russo et al.14 are introduced for performance evaluation. Table 7 summarises the statistical information about the predictive capability of these models compared to the proposed equations for two different reinforcement configurations, i.e., the case without web reinforcement (WOR) and the case with web reinforcement (WWR). The values of (μ, CoV) obtained by the proposed equations are (1.003, 0.207) and (1.004, 0.192) for WOR and WWR cases, respectively, which shows that these expressions perform well in terms of predictive stability and robustness compared to the present closed-form models. Additionally, Fig. 6 presents the scatter plots to illustrate the relationship between experimental and predicted results based on the entire database obtained by the proposed expressions and the four closed-form models. In Fig. 6, ACI 318-19, EC2, and MW expressions exhibit similar performance, with over-diagonal-skewed distribution, indicating that these models tend toward conservative prediction. On the other hand, the proposed equations demonstrate concentrated prediction-to-test ratios around unity, with (μ, CoV) values of (1.003, 0.198), marking the best results among these models. Furthermore, the CATBoost model displays superior performance with (μ, CoV) values of (1.01, 0.088), highlighting its excellent efficacy in employing ML techniques for shear strength prediction of RCDBs.

Table 7.

Comparison of the developed ML models.

Metrics Without web reinforcement (WOR) With web reinforcement (WWR) Overall
MW13 Russo14 EC218 ACI17 Prop MW13 Russo14 EC218 ACI17 Prop MW13 Russo14 EC218 ACI17 CatB Prop
Mean μ 0.782 1.083 0.589 0.669 1.003 0.822 1.075 0.732 0.718 1.004 0.807 1.078 0.677 0.699 1.01 1.003
CoV 0.308 0.251 0.299 0.283 0.207 0.432 0.233 0.35 0.409 0.192 0.392 0.24 0.353 0.372 0.085 0.198
R2 0.824 0.866 0.487 0.671 0.917 0.634 0.942 0.722 0.559 0.937 0.664 0.932 0.692 0.579 0.986 0.935
MAPE 43.08 17.93 82.88 62.32 16.80 70.09 13.57 56.70 65.93 13.70 59.74 15.28 66.73 64.544 5.389 14.89
RMSE( kN) 168.6 147.3 288 230.6 115.9 474.2 189.3 413.2 520.6 196.7 386.7 174.4 370.2 433 79.1 170.3
a20-index 0.301 0.621 0.087 0.177 0.68 0.407 0.763 0.27 0.241 0.78 0.367 0.708 0.2 0.217 0.975 0.742

Figure 6.

Figure 6

Comparison between proposed equations and previous models.

Figure 7 illustrates the prediction errors of both design standards and the developed ML models. In Fig. 7a, CATB, GPR, and LGBM models demonstrate precision, with over 81% of test samples falling within the 10% error range. In contrast, the MW and Russo formulas exhibit 21% and 39% of test samples within the same error range, respectively. As noticed, ACI318 and EC2 provisions perform less effectively, capturing only 11% and 16% of test samples within the 10% error range, respectively. These results highlight the superior accuracy of most ML models, particularly CATB, GPR, and LGBM, in predicting the shear strength of RCDBs compared to traditional design standards. In Fig. 7b, the performance of the proposed equation and Russo formula for the WOR case is comparable, with a slight advantage for the proposed equation. The better performance of the proposed equation in the WOR case is evident in Table 7, where it exhibits a smaller error metric (i.e., CoV of 0.207) compared to the Russo formula (i.e., CoV of 0.251). In the WWR case, the proposed equation outperforms previous models, as shown in Fig. 7b, giving slightly better predictions (i.e., CoV of 0.192) compared to the Russo formula (i.e., CoV of 0.233) and outperforming ACI 318-19, EC2, and MW results, displaying almost twice the number of test samples as MW formulas and four times the number of test samples as ACI318 and EC2 for the same error ranges. Although the results of the proposed equations and the Russo formula are comparable, the proposed equations are more straightforward. Furthermore, all performance metrics for the proposed equations, outlined in Table 7, surpass those of the previously introduced mechanical models.

Figure 7.

Figure 7

Prediction errors of design standards and established ML models. (a) The proposed equations, ML models and previous models, (b) The proposed equations and previous models for WOR and WWR cases.

Feature importance analysis

Evaluating the influence of input parameters on the shear strength of RCDBs is a critical aspect of designing RCDBs. This study employs the Shapley Additive Explanation (SHAP) method to analyze the impact of input parameters on the shear strength parameter, Vu/bwh45. Figure 8a and b display the SHAP feature importance of each input feature for the WOR and WWR databases, respectively. A feature importance value greater than zero indicates a positive correlation between the variable and the strength index. In contrast, a value less than zero signifies a negative impact on the strength index. The span-to-depth ratio (a/d), concrete strength (fc′), and longitudinal reinforcement ratio (ρl) stand out as the most influential design parameters within the dataset for both WOR and WWR RCDBs. In addition, feature importance analysis shows that vertical and horizontal web reinforcement ratio (ρv, ρh) are the forth and fifth most important features for WWR database. The importance of the remaining variables' features is ranked in descending order.

Figure 8.

Figure 8

Features importance for inputs influencing shear strength of RC deep beams. (a) Database without web reinforcement, (b) database with web reinforcement.

Additionally, it can be observed that, except for the a/d ratio and beam height (h), all other input variables have a positive and mixed impact on the strength index. Increasing concrete strength, reinforcement ratios (ρl, ρv, ρh), and their yield strength will enhance the shear strength of RCDBs, while a/d ratio and beam height (h) negatively influence shear strength. The negative impact of the a/d ratio aligns with experimental results conducted by Kani41, which showed that beams exhibit higher shear resistance at lower a/d values. Furthermore, increasing the beam height (h) reduces the shear resistance, as a deeper beam leads to deterioration of the shear transfer strength by aggregate interlock of the critical shear crack and relatively high energy release, thereby aggravating the reduction in shear resistance46.

Reliability analysis

This section introduces the results of reliability indices for the shear strength of RCDBs for the CATB model and the two proposed equations. In addition, it assesses the existing design factors outlined in two existing code standards, including ACI318-1917 or EC218. The limit state function g of shear strength of RCDBs47 can be defined as:

g=R-Q=1θRVuc-D+L 6

where R is the random values of shear strength of RCDBs, defined as the predicted shear capacity (Vuc) divided by the prediction-to-test ratio θR, and Q is the random values of load effect, including the dead load (D) and live load (L), The value Vuc is calculated for each model from Table 6 with the partial resistance factors taken as unity, and using the random values of design variables given in Table 8. Using the distribution fit tool in Matlab, it was found that θR ratio is best fitted with lognormal distribution with mean and variance corresponding to each code standard, as indicated in Table 9. The nominal values Dn and Ln can be computed from the design resistance Vd for a given live-to-dead load ratio (Ln/Dn) as follows:

Vdfckγc,fyγsorϕVnfck,fy=Sdi.e.γDDn+γLLn 7
Dn=RdorϕRnγD+γL·k,Ln=Dn·k 8

where k is the live-to-dead load ratio Ln/Dn, the reduced designed resistance (Vd) is extracted from dividing the characteristic strength of concrete and steel materials (fck and fy) by the material partial factors (γc and γs)18 or multiplying the nominal resistance (Vn) by a strength reduction factor (ϕ)17, and then Vd is balanced by the enlarged designed load effect (Sd) to ensure a suitable safety margin. Sd is obtained by multiplying the nominal load values, including dead and live loads (Dn and Ln), by, respectively, partial load factors (γD and γL) and then combining them linearly. These partial factors are summarised in Table 9 for each code standard.

Table 8.

Statistical properties of random variables.

Properties Variables Mean Cov (%) Std. Distribution Space Refs.
Geometry bw (mm) bw+2.286 4.826 (mm) Normal 200 47,48
h (mm) h 2.0 Normal {1000, 2000, 3000} 49,50
d (mm) d-4.826 12.7 (mm) Normal 0.97 (h) 47,48
a/d a/d Deterministic {0.5, 1.25, 2.0} 47
Material fc (MPa) χfc* 10.1 Normal {20, 40, 60} 51,52
fy (MPa) 1.2fy 8.3 Lognormal {235, 355, 420} 50,53,54
fyh (MPa) 1.2fy 8.3 Lognormal 235 50,53,54
fyv (MPa) 1.2fy 8.3 Lognormal 235 50,53,54
Reinforcement ρl ρl 1.25 Normal {0.004, 0.012, 0.02} 50,54,55
ρhw ρhw 1.25 Normal {0.002, 0.06, 0.01} 50,54,55
ρvw ρhw 1.25 Normal {0.002, 0.06, 0.01} 50,54,55
Load k (load ratio) k Deterministic {0.5, 1.25, 2.0}
D (dead load) 1.05D 0.1 Normal 24,56
L (live load) L 0.18 Gumbel 47,56

*χ=3.0469-0.13543fc+0.317430.1fc2-0.024130.1fc3 51.

Table 9.

Load and resistance factors, the prediction-to-test ratio θ_R distributions, recommended strength reduction factor ϕ.

ACI31817 EC218 CATB Prop.Eqn (WOR) Prop.Eqn (WWR)
Load factor γ
 Dead load γD 1.2 1.35 1.2 1.2** 1.2**
 Live load γL 1.6 1.5 1.6 1.6 1.6
Best-fit distribution θR (Table *) Lognormal Lognormal Lognormal Lognormal Lognormal
 Mean (Table *) 0.699 0.677 1.01 1.003 1.004
 CoV (Table *) 0.372 0.353 0.085 0.207 0.192
 Target reliability β 3.559 3.818 5.29* 3.5** 3.5**
 Evaluated strength reduction factor ϕ for the target reliability index 0.58 0.78 1.0 0.76 0.78

*Target reliability β=5.29 evaluated for strength reduction factor ϕ=1.0.

**Load factors and target reliability β are assumed to be identical to these of ACI318.

The safety level of structures can be measured by the reliability index β, a factor related to the failure probability Pf, as follows57:

β=Φ-1Pf 9

where Φ is the standard cumulative distribution function. Monte Carlo simulation (MCS) is employed to determine the reliability index due to its simplicity, insensitivity to problem dimensions, and satisfactory accuracy56. In MCS, the failure probability can be calculated as

Pf=NfailN 10

where N and Nfail are the total number of simulations and the number of failed simulations (when the limit state function is violated, i.e. g ≤ 0), respectively. To accurately predict the reliability index of the design codes, the uncertainty or randomness of all input variables, including material geometry and loads, should be considered47. Thirteen random variables are considered in this study, and the statistical properties are summarised in Table 8. The random numbers of variable inputs are generated with continuous variations stochastically chosen from their respective distribution functions (Table 8) and drawn from a wide range of geometric and geometry parameters of RCDBs configurations. They include three values of concrete compressive strength fc′ = {20, 40, 60} MPa, three values of beam height h = {1000, 2000, 3000} mm, three ratios for longitudinal, VWR, and VWR ratios, three values of longitudinal steel yield stress fyl = {235, 355, 420} MPa, four ratios for a/d = {0.5, 1.0, 1.5, 2.0}, four ratios of Ln/Dn = {0.5, 1.0, 1.5, 2.0}. In total, there are 3 × 3 × 3 × 3 × 3 × 3 × 4 × 4 = 11,664 beam configurations considered for each considered model. The target safety level stipulated by ACI31817 and EC218 provisions for the shear strength of RCDBs are 3.5 and 3.8, respectively. The accuracy of MCS is dependent on the number of samples N. The number of samples N used in this study for achieving a reliability index β equal to 3.8 with acceptable accuracy (CoV of 5%) is 5,528,43058.

As illustrated in Fig. 9, the strength reduction factors (ϕ) for the proposed equations are 0.76 and 0.78 for cases without web reinforcement (WOR) and with web reinforcement (WWR), respectively, at a reliability index value of 3.5. The higher strength reduction factor in the WWR case is attributed to the ductile behaviour exhibited by beams with web reinforcement compared to those without web reinforcement. Furthermore, the strength reduction factors corresponding to the target reliability for the shear strength design of RCDBs according to ACI318 and EC2 are 0.58 and 0.78, respectively. While the strength reduction factor for the proposed equation without web reinforcement is comparable to that of EC2 (ϕ = 0.78), a notable distinction lies in their mean values of θR. As detailed in Table 8, the proposed equation yields a mean value for θR close to 1.0, whereas the EC2 code standard yields a smaller mean value of 0.677. As per Eqs. (6) and (10), smaller mean values of θR correspond to low failure probability and relatively high strength reduction factors. Therefore, the reliability associated with the proposed equations surpasses the reliability results obtained by applying code standards. Moreover, Fig. 9 reveals that the CATB model, when used with ϕ = 1.0, can achieve a high-reliability index of 5.29. This high reliability index is attributed to the low CoV error metric of the CATB model compared to other models, as outlined in Table 9, indicating the reliability of using ML models in enhancing the predictive accuracy for the shear strength of RCDBs.

Figure 9.

Figure 9

Variation of reliability index β in terms of strength reduction factor ϕ for the proposed equations, EC2 and ACI318.

Conclusions

In conclusion, this study compiled a comprehensive experimental database of 840 experimental tests for the shear strength of RCDBs from various research papers. It employed eight machine learning models optimised using the Bayesian Optimization (BO) technique. In addition, proposed expressions are presented for designing RCDBs. From the evolution results, the following conclusions can be drawn:

  • The CATBoost, GPR, and LGBM models exhibited outstanding accuracy and stability, surpassing traditional design standards. The CATBoost model demonstrated the best prediction accuracy and generalisation ability, outperforming other ML models.

  • The introduced explicit design formulas, derived through symbolic regression, are straightforward and robust, offering simplicity and robustness compared to previous approaches.

  • Comparison with closed-form models and design standards, such as ACI 318-19 and EC2, highlighted the efficiency of the proposed equations, which displayed superior predictive stability and robustness.

  • SHAP analysis revealed that increasing concrete strength, reinforcement ratios (ρl, ρv, ρh) and their yield strength will enhance the performance of RCDBs, while increasing a/d ratio and beam height (h) will negatively impact the shear strength parameter, Vu/bwh.

  • The reliability analysis indicated that the CATBoost model and proposed equations surpassed code standards regarding reliability and accuracy.

In summary, integrating the ML-based approach presents a promising approach for accurately predicting the shear strength of RCDBs, providing valuable insights for engineering applications.

Supplementary Information

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author contributions

K.M. is responsible for material preparation, data collection, analysis and preparing the figures.

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Data availability

All data generated or analysed during this study are included in this published article and available in a public repository: https://github.com/kmegahed/Deep-beam-ML-models.

Competing interests

The author declares no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-64386-w.

References

  • 1.MacGregor JG, Wight JK, Teng S, Irawan P. Reinforced Concrete: Mechanics and Design. Prentice Hall; 1997. [Google Scholar]
  • 2.Ma C, Wang S, Zhao J, Xiao X, Xie C, Feng X. Prediction of shear strength of RC deep beams based on interpretable machine learning. Constr. Build. Mater. 2023;387:131640. doi: 10.1016/j.conbuildmat.2023.131640. [DOI] [Google Scholar]
  • 3.Le Nguyen K, Thi Trinh H, Nguyen TT, Nguyen HD. Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications. Expert Syst. Appl. 2023;230:120649. doi: 10.1016/j.eswa.2023.120649. [DOI] [Google Scholar]
  • 4.Ashour AF, Alvarez LF, Toropov VV. Empirical modelling of shear strength of RC deep beams by genetic programming. Comput. Struct. 2003;81(5):331–338. doi: 10.1016/S0045-7949(02)00437-6. [DOI] [Google Scholar]
  • 5.Cheng MY, Cao MT. Evolutionary multivariate adaptive regression splines for estimating shear strength in reinforced-concrete deep beams. Eng. Appl. Artif. Intell. 2014;28:86–96. doi: 10.1016/j.engappai.2013.11.001. [DOI] [Google Scholar]
  • 6.Feng D-C, Wang W-J, Mangalathu S, Hu G, Wu T. Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements. Eng. Struct. 2021;235:111979. doi: 10.1016/j.engstruct.2021.111979. [DOI] [Google Scholar]
  • 7.Tiwari A, Gupta AK, Gupta T. A robust approach to shear strength prediction of reinforced concrete deep beams using ensemble learning with SHAP interpretability. Soft Comput. 2023 doi: 10.1007/s00500-023-09495-w. [DOI] [Google Scholar]
  • 8.Liu MY, Li Z, Zhang H. Probabilistic shear strength prediction for deep beams based on Bayesian-optimized data-driven approach. Buildings. 2023;13(10):1–16. doi: 10.3390/buildings13102471. [DOI] [Google Scholar]
  • 9.Shahnewaz M, Rteil A, Alam MS. Shear strength of reinforced concrete deep beams—A review with improved model by genetic algorithm and reliability analysis. Structures. 2020;23:494–508. doi: 10.1016/j.istruc.2019.09.006. [DOI] [Google Scholar]
  • 10.Wakjira T, Ibrahim M, Sajjad B, Ebead U. Shear capacity of reinforced concrete deep beams using genetic algorithm. IOP Conf. Ser. Mater. Sci. Eng. 2020;910(1):012002. doi: 10.1088/1757-899X/910/1/012002. [DOI] [Google Scholar]
  • 11.Hameed MM, Khaleel F, AlOmar MK, Mohd Razali SF, Alsaadi MA. Optimising the selection of input variables to increase the predicting accuracy of shear strength for deep beams. Complexity. 2022 doi: 10.1155/2022/6532763. [DOI] [Google Scholar]
  • 12.Park J, Kuchma D. Strut-and-tie model analysis for strength prediction of deep beams. ACI Struct. J. 2007;104:657–666. [Google Scholar]
  • 13.Matamoros AB, Wong KH. Design of simply supported deep beams using strut-and-tie models. ACI Struct. J. 2003;100(6):704–712. [Google Scholar]
  • 14.Russo G, Pauletta M, Venir R. Reinforced concrete deep beams-shear strength model and design formula. ACI Struct. J. 2005;102(3):429. doi: 10.14359/14414. [DOI] [Google Scholar]
  • 15.Vecchio FJ, Collins MP. The modified compression-field theory for reinforced concrete elements subjected to shear. ACI J. 1986;19(16):219–231. [Google Scholar]
  • 16.Tang CY, Tan K-H. Interactive mechanical model for shear strength of deep beams. J. Struct. Eng. ASCE. 2004;130:1534–1544. doi: 10.1061/(ASCE)0733-9445(2004)130:10(1534). [DOI] [Google Scholar]
  • 17.318 ACI Committee . Building Code Requirements for Structural Concrete: (ACI 318-19); and Commentary (ACI 318R–19) American Concrete Institute; 2019. [Google Scholar]
  • 18.Hendy CR, Smith DA. Designers’ Guide to EN 1992–2: Eurocode 2: Design of Concrete Structures: Part 2: Concrete Bridges. Thomas Telford; 2007. [Google Scholar]
  • 19.Chen H, Yi WJ, Hwang HJ. Cracking strut-and-tie model for shear strength evaluation of reinforced concrete deep beams. Eng. Struct. 2018;163:396–408. doi: 10.1016/j.engstruct.2018.02.077. [DOI] [Google Scholar]
  • 20.Chetchotisak P, Teerawong J, Yindeesuk S. Modified interactive strut-and-tie modeling of reinforced concrete deep beams and corbels. Structures. 2022;45:284–298. doi: 10.1016/j.istruc.2022.08.116. [DOI] [Google Scholar]
  • 21.Fan S, Zhang Y, Ma Y-X, Tan KH. Strut-and-tie and finite element modelling of unsymmetrically-loaded deep beams. Structures. 2022;36:805–821. doi: 10.1016/j.istruc.2021.12.037. [DOI] [Google Scholar]
  • 22.Liang S, Shen Y, Gao X, Cai Y, Fei Z. Symbolic machine learning improved MCFT model for punching shear resistance of FRP-reinforced concrete slabs. J. Build. Eng. 2023;69:106257. doi: 10.1016/j.jobe.2023.106257. [DOI] [Google Scholar]
  • 23.Aguilar V, Barnes RW, Nowak A. Strength reduction factors for ACI 318 strut-and-tie method for deep beams. ACI Struct. J. 2022;119(2):103–112. [Google Scholar]
  • 24.Muendacha D, Teerawong J, Chetchotisak P. A safety-based evaluation of strut-and-tie methods for shear design of RC deep beams in accordance with international concrete codes. Eng. Appl. Sci. Res. 2020;47(2):137–144. doi: 10.14456/easr.2020.14. [DOI] [Google Scholar]
  • 25.Shen L, Shen Y, Liang S. Reliability analysis of RC slab-column joints under punching shear load using a machine learning-based surrogate model. Buildings. 2022;12(10):1750. doi: 10.3390/buildings12101750. [DOI] [Google Scholar]
  • 26.Ismail KS. Shear Behaviour of Reinforced Concrete Deep Beams. University of Sheffield; 2016. [Google Scholar]
  • 27.Schober P, Boer C, Schwarte LA. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018;126(5):1763–1768. doi: 10.1213/ANE.0000000000002864. [DOI] [PubMed] [Google Scholar]
  • 28.Koza JR. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 1994;4(2):87–112. doi: 10.1007/BF00175355. [DOI] [Google Scholar]
  • 29.Udrescu S-M, Tegmark M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 2020;6(16):eaay2631. doi: 10.1126/sciadv.aay2631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rasmussen CE, Williams CKI, et al. Gaussian Processes for Machine Learning. Springer; 2006. [Google Scholar]
  • 31.G. Ke et al. LightGBM: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf (2017).
  • 32.Breiman L. Random forests. Mach. Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 33.Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. CoRR, Available: http://arxiv.org/abs/1810.11363 (2018).
  • 34.Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on kNowledge Discovery and Data Mining, 785–794. 10.1145/2939672.2939785 (2016).
  • 35.Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process. Lett. 1999;9(3):293–300. doi: 10.1023/A:1018628609742. [DOI] [Google Scholar]
  • 36.Schapire RE. The strength of weak learnability. Mach. Learn. 1990;5(2):197–227. doi: 10.1007/BF00116037. [DOI] [Google Scholar]
  • 37.Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing , vol. 31 (2018).
  • 38.Goldberg DE, Holland JH. Genetic algorithms and machine learning. Mach. Learn. 1988;3(2):95–99. doi: 10.1023/A:1022602019183. [DOI] [Google Scholar]
  • 39.Cranmer, M. Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. Available: http://arxiv.org/abs/2305.01582 (2023).
  • 40.Megahed K, Mahmoud NS, Abd-Rabou SEM. Prediction of the axial compression capacity of stub CFST columns using machine learning techniques. Sci. Rep. 2024;14(1):2885. doi: 10.1038/s41598-024-53352-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kani G. How safe are our large reinforced concrete beams? J. Proc. 1967;64(3):128–141. [Google Scholar]
  • 42.Yang L, Shami A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing. 2020;415:295–316. doi: 10.1016/j.neucom.2020.07.061. [DOI] [Google Scholar]
  • 43.Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, vol. 24. Available: https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf (2011).
  • 44.Asteris PG, Mokos VG. Concrete compressive strength using artificial neural networks. Neural Comput. Appl. 2020;32(15):11807–11826. doi: 10.1007/s00521-019-04663-2. [DOI] [Google Scholar]
  • 45.Wang J, Lu R, Cheng M. Application of ensemble model in capacity prediction of the CCFST columns under axial and eccentric loading. Sci. Rep. 2023;13(1):9488. doi: 10.1038/s41598-023-36576-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chen H, Yi W-J, Ma ZJ. Shear size effect in simply supported RC deep beams. Eng. Struct. 2019;182:268–278. doi: 10.1016/j.engstruct.2018.12.062. [DOI] [Google Scholar]
  • 47.Nasrollahzadeh K, Aghamohammadi R. Reliability analysis of shear strength provisions for FRP-reinforced concrete beams. Eng. Struct. 2018;176:785–800. doi: 10.1016/j.engstruct.2018.09.016. [DOI] [Google Scholar]
  • 48.Mirza SA, MacGregor JG. Probabilistic study of strength of reinforced concrete members. Can. J. Civ. Eng. 1982;9(3):431–448. doi: 10.1139/l82-053. [DOI] [Google Scholar]
  • 49.Sýkora M, Holický M, Marková J. Verification of existing reinforced concrete bridges using the semi-probabilistic approach. Eng. Struct. 2013;56:1419–1426. doi: 10.1016/j.engstruct.2013.07.015. [DOI] [Google Scholar]
  • 50.Yang IH, Joh C, Kim B-S. Structural behavior of ultra high performance concrete beams subjected to bending. Eng. Struct. 2010;32(11):3478–3487. doi: 10.1016/j.engstruct.2010.07.017. [DOI] [Google Scholar]
  • 51.Nowak AS, Szerszen MM. Calibration of design code for buildings (ACI 318): Part 1—Statistical models for resistance. ACI Struct. J. 2003;100(3):377–382. [Google Scholar]
  • 52.Eamon C, Jensen E. Reliability analysis of RC beams exposed to fire. J. Struct. Eng. 2013;139:212–220. doi: 10.1061/(ASCE)ST.1943-541X.0000614. [DOI] [Google Scholar]
  • 53.Hess PE, Bruchman D, Assakkaf IA, Ayyub BM. Uncertainties in material and geometric strength and load variables. Nav. Eng. J. 2002;114(2):139–166. doi: 10.1111/j.1559-3584.2002.tb00128.x. [DOI] [Google Scholar]
  • 54.Abbas YM. Shear behavior of ultra-high-performance reinforced concrete beams—Finite element and uncertainty quantification study. Structures. 2023;47:2365–2380. doi: 10.1016/j.istruc.2022.12.060. [DOI] [Google Scholar]
  • 55.Al-Harthy AS, Frangopol DM. Reliability assessment of prestressed concrete beams. J. Struct. Eng. 1994;120(1):180–199. doi: 10.1061/(ASCE)0733-9445(1994)120:1(180). [DOI] [Google Scholar]
  • 56.Nowak AS, Collins KR. Reliability of Structures. CRC Press; 2012. [Google Scholar]
  • 57.Rackwitz R, Flessler B. Structural reliability under combined random load sequences. Comput. Struct. 1978;9(5):489–494. doi: 10.1016/0045-7949(78)90046-9. [DOI] [Google Scholar]
  • 58.Soong TT, Grigoriu M. Random vibration of mechanical and structural systems. NASA STI/Recon Tech. Rep. A. 1993;93:14690. [Google Scholar]
  • 59.Nowak AS. Calibration of LRFD bridge code. J. Struct. Eng. 1995;121(8):1245–1251. doi: 10.1061/(ASCE)0733-9445(1995)121:8(1245). [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data generated or analysed during this study are included in this published article and available in a public repository: https://github.com/kmegahed/Deep-beam-ML-models.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES