Abstract
Reinforced concrete (RC) beam design currently faces significant challenges from the substantial carbon footprint of cementitious materials and the lack of practical automated tools for simultaneous structural-environmental optimization. To address this, this paper proposes an innovative two-stage framework by utilizing artificial neural networks (ANNs) and deep reinforcement learning (RL) to automate the design of sustainable and low-carbon RC beams. In the first stage, following a comprehensive analysis of 14 machine learning algorithms, an ANN was selected for its superior predictive accuracy. The trained ANN effectively predicts concrete compressive strength (coefficient of determination, R² ≈ 0.85) and carbon dioxide (CO2) emissions (R² ≈ 0.99), critical parameters for subsequent optimization, exhibiting a loss function value of 0.15 and a mean absolute error of 0.38. The second stage involves the decision-making process for designing RC beams through deep RL, utilizing a Proximal Policy Optimization (PPO) as an agent. The agent operates within a 13-dimensional parametric action space, encompassing geometric and material composition variables, and interacts with a 26-variable state space to balance structural integrity with environmental sustainability. A customized RL environment was created to optimize designs for minimal CO2 emissions and evaluate compliance with ACI 318 − 19 flexural design criteria. The resulting framework demonstrates comprehensive sustainability achievements, with comparative benchmarking showing PPO-optimized designs yield 43.35–75.04% lower CO₂ emissions than those from an Advantage Actor-Critic (A2C) agent, alongside automated ACI 318 − 19 code compliance, optimized utilization of supplementary cementitious materials (SCMs), and improved structural efficiency through intelligent geometric parameter selection. The code is available as open source, and a web-based interface facilitates the dissemination of research outcomes.
Keywords: Sustainable design, Deep reinforcement learning, Artificial intelligence, Carbon emission, Optimization, Concrete
Subject terms: Engineering, Mathematics and computing
Introduction
Literature review
Accurate prediction of concrete compressive strength is paramount to ensure the safety and stability of buildings, bridges, and critical infrastructures. Furthermore, reliable prediction of compressive strength can yield substantial savings in both time and resources. Consequently, numerous investigations have been conducted on using machine learning to predict the compressive strength of concrete. For example, Zhang et al. investigated the prediction of the compressive strength of high-performance concrete, emphasizing the interpretability of machine learning models. Their findings indicated that age, water-to-cement ratio, slag, and water are the most influential input parameters1. Moreover, Hosseinzadeh et al. examined the compressive strength estimation of concrete incorporating recycled aggregate and fly ash, and the results revealed that the Extreme Gradient Boosting (XGBoost) algorithm attained 95% accuracy for compressive strength estimation2. Similarly, Manan et al. employed an ANN to predict not only compressive strength but also split tensile strength and modulus of elasticity for concrete containing recycled aggregates, achieving a high R² value of 0.93 for compressive strength during the training phase3.
Further research has also investigated the utilization of machine learning to develop sustainable concrete with favorable mechanical properties, such as the mechanical properties of lightweight sandstone concrete with varying proportions of silica fume4 and concrete containing waste tires, demonstrating that it is more suitable than conventional concrete in properties such as energy absorption and ductility5. Likewise, the challenge of predicting multiple strength properties for recycled aggregate concrete has been addressed through various machine learning models, with studies showing that algorithms like Random Forest can effectively model both compressive and tensile strength simultaneously, providing a valuable tool for optimizing sustainable concrete mixes6. Compressive strength is intrinsically linked to the durability and lifespan of concrete structures, and precise estimation of its key parameters, such as chloride ion penetration, enhances its performance in environmental degradation and structural safety7. Additionally, the prediction of chloride concentration in the interfacial zone was realized by Yang et al. using deep neural networks with an accuracy of 0.98478.
Recent research has further broadened the application of machine learning for predicting properties across a diverse range of sustainable concrete types. For instance, Shaaban et al. successfully used various machine learning models, with XGBoost performing best, to forecast the compressive strength of high-strength concrete9. Similarly, Yu et al. employed a stacking ensemble model to predict both the permeability and compressive strength of pervious concrete, demonstrating the utility of advanced ensemble methods10. The focus has also extended to Ultra-High-Performance Concrete, where Onyelowe et al. developed frameworks to predict a suite of mechanical properties, compressive strength, flexural strength, slump, and porosity, by incorporating various industrial byproducts, thereby linking predictive modeling to sustainable material use11. This connection to sustainability is also evident in the work of Khan et al., who used machine learning to predict the mechanical performance of concrete made with recycled concrete aggregate12.
However, these studies, while advancing predictive accuracy, are primarily focused on forecasting material properties and do not address the challenge of automated design optimization. They establish what a concrete mix can achieve, but do not provide a framework for an agent to autonomously decide on an optimal, low-carbon design for a structural member like a beam.
Given the importance of flexural behavior in concrete, Li et al. investigated the flexural strength of concrete containing SCMs and achieved an R2 of 93%13. Similarly, Nguyen et al. examined the flexural behavior of RC beams incorporating recycled aggregate, Carbon Fiber Reinforced Polymer (CFRP), and fly ash and attained an R2 of 98% on a test dataset14. Further contributing to this area, Manan and Zhang developed and compared several machine learning models to predict the flexural strength of Fiber Reinforced Polymer bar RC beams, concluding that an ANN model provided the highest accuracy15. Broadening the prediction of mechanical properties beyond flexural strength, Feng et al. predicted the shear strength of deep RC beams. The dataset consisted of 271 experimental data points and used a combination of four machine learning algorithms. The R2 score was 0.928, which is a significant improvement over an individual machine learning model16. Benzaamia et al. developed a deep learning model optimized with OPTUNA, an automatic hyperparameter optimization software framework, to predict the compressive strength of concrete confined with CFRP. They reported an R2 of 93% and a mean absolute percentage error of 7.89%17.
The integration of machine learning in structural engineering has evolved beyond traditional prediction tasks to encompass comprehensive analysis and optimization frameworks. To bridge the gap between theory and practice, Elshaarawy et al. developed a user-friendly graphical user interface that displayed the CatBoost model results for estimating concrete compressive strength18. Advanced applications of machine learning and deep learning have expanded to include the utilization of Scanning Electron Microscopy (SEM) images to estimate permeability19, evaluating various models for identifying subsurface defects in concrete using infrared thermography20, and predicting the durability of concrete against freeze-thaw damage21.
Further exemplifying this trend, Shokrnia et al. integrated metaheuristic optimization algorithms (like particle swarm optimization and grey wolf optimizer) with machine learning models (adaptive neuro-fuzzy inference system and extreme learning machine) to enhance the predictive accuracy for the compressive strength of fiber-reinforced concrete22. While this represents an important step in combining machine learning with optimization, the optimization was aimed at improving the predictive model’s parameters rather than performing an automated, objective-driven design of the concrete member itself.
However, a key limitation of many machine learning models is their “black-box” nature, which can hinder trust and practical adoption. To address this, knowledge-guided frameworks have been developed to make Artificial Intelligence (AI) models interpretable. Guo et al. pioneered this approach for designing ultra-high-performance geopolymer, integrating a knowledge graph to ensure predictions were compliant with domain-specific physicochemical principles23. This concept was further advanced in a multi-agent collaboration framework for designing ultra-high-performance concrete, where a Large Language Model was employed to automate the creation of the knowledge graph, thus improving the scalability and efficiency of the design process24.
The growing emphasis on sustainable construction has driven the development of machine learning approaches for carbon footprint assessment and reduction. Considering the significant importance of the carbon footprint of concrete, Wudil et al. applied machine learning and considered various complex factors such as raw material source, alkaline activator manufacturing, and high curing temperature to develop an accurate and reliable methodology to estimate the carbon footprint of fly ash geopolymer concrete25. Similarly, Al-Fakih et al. integrated machine learning models to predict the carbon footprint of geopolymer concrete, primarily based on ground-granulated blast furnace slag, to achieve a highly precise model for carbon footprint prediction26. Recent work has also focused on the environmental impact of using recycled concrete powder (RCP) as a cement substitute, employing Life Cycle Assessment (LCA) to quantify reductions in environmental and human health impacts. These studies confirm that replacing cement with RCP significantly lowers the carbon footprint, with machine learning models like Gradient Boosting being used to accurately predict the compressive strength of these sustainable mixes27,28.
Furthermore, Ren et al. optimized the mixture ratio of cement with manufactured sand utilizing machine learning and substantially reduced carbon emissions compared to traditional combined approaches29. Advanced computational approaches have been developed to address the complex, nonlinear nature of carbon emission estimation. Recognizing the limitations of conventional statistical methods addressing nonlinear and complex problems for CO2 estimation, Ghorbal et al. introduced a novel ensembled approach of dual path recurrent neural networks (DPRNNs) with ninja metaheuristic optimization algorithm (NiOA) and attained an R2 accuracy of 97.3%. In this framework, DPRNNs aim to capture long-term and short-term dependencies in time series data, and NiOA was employed to fine-tune the parameters of DPRNNs30.
Also, Golafshani et al. investigated the mitigation of carbon footprints in recycled concrete incorporating SCMs, and the findings demonstrated that employing machine learning algorithms such as CatBoost significantly decreased the carbon footprint from 5 to 31.5%31.
The complexity of modern structural design requires sophisticated optimization approaches that can simultaneously address multiple competing objectives. Quantifying the optimal amount of SCMs for cement replacement to curtail carbon emissions while preserving concrete performance and cost-effectiveness is crucial32. Cao et al. proposed a model that effectively addresses this challenge with a combined Bayesian optimization, natural gradient boosting, and non-dominated sorting genetic algorithm33. Similarly, Zhang et al. proposed a framework by combining multi-objective optimization with deep Learning. The framework used a multi-objective genetic algorithm to explore various combinations of Self-Centering Braces and Buckling Restrained Braces. The objectives are to minimize Peak Inter-story Drift, Residual Inter-story Drift, and Peak Floor Accelerations simultaneously. The R² score was 0.98 and accurately predicted Peak Interstory Drift and Peak Floor Accelerations34. The optimization technique can also be applied in other domains, such as recycled asphalt pavement mixtures at large scales35 and the impact of high steel fiber content in ultra-high-performance concrete mixtures36.
Because deep neural networks require substantial training data, RL can be employed instead due to interaction with the environment to acquire data for structural design. For example, Jeong et al. introduced a new approach using the Deep Deterministic Policy Gradient (DDPG) algorithm to optimize reinforced beams. This approach aimed to minimize the cost of RC beams while adhering to the building codes. The outcome was the development of an agent that could generate cost-effective beams while complying with the design codes37.
Conventional methodologies for designing steel frame structures primarily depend on manual calculations and finite element methods. To address this problem, Fu et al. developed the Frame RL framework to facilitate the design of steel frame structures and employed it to analyze a complex high-rise structure38.
The application of RL extends to structural maintenance and monitoring systems, which are crucial for ensuring long-term structural performance. Cheng et al. employed the Markov learning process and deep RL to enhance the safety and capacity assessment of existing bridges39. This approach demonstrates RL’s capability in balancing cost and collapse probability40 and facilitating life cycle management for large-scale structures41. In infrastructure maintenance, Han et al. proposed an intelligent decision-making model for asphalt pavement maintenance programs, which was enhanced by 17.2% compared to the traditional ANNs model42. Similarly, Yao et al. introduced a decision-making framework leveraging RL to determine repair strategies, optimize long-term cost-effectiveness, and manage pavement conditions43. Infrastructure maintenance is paramount and prolongs its lifespan, and one of the fundamental infrastructures for consistent upkeep is pavement44.
Advanced applications of RL in structural engineering encompass multi-objective optimization and complex structural systems. Zhang et al. implemented multi-agent deep RL to overcome limitations in automated spatial layout algorithms, with findings demonstrating that the trained agent increased the efficiency of AI-driven intelligent design and design processes in building renovation45. For building performance optimization, Pan et al. addressed challenges such as energy consumption, carbon emissions, and thermal comfort using the DDPG algorithm, achieving balance among multiple objectives for green building performance46.
To contextualize our work, Table 1 summarizes key prior research. This table outlines prominent studies that apply RL to structural optimization and civil engineering applications, highlighting their core findings and the specific research gaps that motivate the present study.
Table 1.
Key applications of RL in structural and civil Engineering.
| Reference | Focus Area & Topic | Methodology & Key Findings | Identified Gap or Relation to Current Study |
|---|---|---|---|
| Lin et al.47 | Automated design of steel-concrete composite beams | PPO - Automated beam design minimizing material cost while adhering to Chinese design codes | It validates PPO for beam design. However, optimizes for cost instead of carbon emissions and addresses composite versus RC beams. |
| Du & Li48 | Optimize precast concrete production for earliness/tardiness and electricity cost | DQN with Evolutionary Algorithm framework - Novel three-part DQN topology achieved 16–41% better Hypervolume and 204–418% better Inverted Generational Distance metrics versus competitive algorithms | Addresses scheduling models lacking complex Distributed Flexible Job Shop Problem with group scheduling and time-of-use electricity costs critical in real-world precast manufacturing. |
| Liu et al.49 | Automate clash-free reinforcement design for precast wall panels | Generative Adversarial Network plus deep RL framework - Pix2pix generates preliminary layouts, DQN refines for clash resolution, reducing engineering time by 80% | Addresses lack of automated clash resolution tools for complex prefabricated elements where traditional optimization fails. |
| Kim et al.50 | Optimize precast concrete production scheduling to minimize tardiness | DQN agent - Outperformed traditional dispatching rules by 4–12% Tardiness reduction with 77% average winning rate | Addresses limitations of static dispatching rules and high computational cost of metaheuristics with fast, adaptive real-time scheduling. |
| Jeong & Jo37 | Automate cost-effective RC beam design compliant with ACI 318 | DDPG with 1D Convolutional Neural Network - Generated code-compliant, economically optimized designs within 110% of near-optimal costs in 0.15 s per design | Focused exclusively on cost optimization. Our work addresses environmental sustainability by minimizing carbon dioxide emissions. |
| Zhang et al.45 | Automate architectural spatial layout for building renovation | Multi-Agent DDPG - Agents learned Functional room layouts achieving significant rewards after approximately 650, 000 training steps | Focuses on architectural layout without structural engineering constraints or environmental metrics like embodied carbon optimization. |
| Pan et al.46 | Multi-objective Building Information Modeling (BIM) based green building design optimization | DDPG with deep Neural Network - Achieved 13.19% building performance improvement, outperforming Non-dominated Sorting Genetic Algorithm II and III across different climate zones | Expands deep RL from single structural component to holistic BIM-based building system addressing broader green objectives beyond embodied carbon. |
| Lai et al.51 | Optimize sustainable maintenance policies for transportation networks | Hierarchical multi-reward Branching Dueling Q-Network framework - Hierarchical multi-reward approach significantly outperformed Branching Dueling Q-Network, DDPG and routine maintenance strategies | Focuses on network-level maintenance over lifecycle versus our initial design optimization of single structural component for low-carbon performance. |
| Fu et al.38 | Automate steel frame structure design for safe, economical solutions | Physics-informed PPO with Finite Element Method - Produced code-compliant, economical designs in less than 1 s, outperforming manual design and Genetic Algorithm | Developed for steel frames with economic objective, did not address RC beam material properties or environmental carbon minimization. |
| Li et al.52 | Intelligent sustainable building energy management system | Attention-based DQN framework − 12% Mean Absolute Percentage Error reduction in photovoltaic forecasting, 10% cost reduction, 15% photovoltaic utilization increase, 20% satisfaction enhancement | Addresses RL limitations in energy management by integrating high-accuracy prediction module for intermittent renewable energy sources. |
| van Remmerden et al.40 | Multi-objective infrastructural maintenance optimization balancing cost and collapse probability | Multi-objective deep Centralized Multi-Agent Actor-Critic algorithm - Successfully learned policies outperforming rule-based heuristics in Amsterdam quay walls case study | Validates multi-objective deep RL for infrastructure maintenance, reinforcing methodology’s potential beyond single-objective optimization in civil engineering. |
| Anwar & Zhang53 | Optimize building retrofit strategies under seismic hazard | A2C with performance-based simulation - Converged faster than PPO and DQN, more efficient than Non-dominated Sorting Genetic Algorithm II, effectively reduced damages and costs | Addresses traditional optimization limitations with deep RL framework for intelligent, multi-objective risk optimization of building portfolios. |
Research significance
Innovative design methods are imperative for advancing sustainable structural engineering, particularly for RC beams, which constitute a primary source of carbon dioxide emissions in construction. Although machine learning has been extensively applied to predict concrete properties and traditional optimization techniques, such as Genetic Algorithms and Particle Swarm Optimization, are well-established, these approaches are typically constrained to lower-dimensional or less restrictive problems.
In contrast, the automated design of RC beams presents a formidable challenge due to its high-dimensional and intricately constrained nature. This complexity arises from the necessity to comply with stringent building codes while simultaneously minimizing embodied carbon, characterized in this study by a 13-parameter action space (encompassing geometric and material specifications) and a comprehensive 26-parameter state space (reflecting design inputs and performance outputs). Deep RL emerges as an optimal paradigm for addressing such high-dimensional, multifaceted problems.
This research introduces a pioneering decision-making framework Tailored for low-carbon RC beam design, capitalizing on deep RL proficiency in managing sequential decision processes within complex environments. RC beam design entails a series of interdependent choices across a mixed continuous-discrete, high-dimensional action space, a domain where deep RL agents excel by iteratively learning optimal policies. Traditional optimization methods falter under the burden of numerous non-linear constraints imposed by standards like ACI 318− 19, which are challenging to explicitly encode. Deep RL, however, adeptly navigates these constraints implicitly through environmental interactions, guided by a reward structure that penalizes code violations and prioritizes minimal CO₂ emissions.
The robustness of this approach is rigorously validated through several key analyses. A reward sensitivity analysis confirms that the agent’s policy can be effectively tuned, while a convergence analysis across multiple random seeds demonstrates the stability and reproducibility of the training process. Crucially, a comparative benchmark against the A2C algorithm establishes the superior performance of the proposed PPO agent, which consistently achieves designs with substantially lower carbon emissions. This adaptable reward shaping effectively balances competing objectives, environmental sustainability and structural integrity, steering the agent toward solutions that are both eco-conscious and robust.
Central to this work is a novel two-stage framework: an ANN first predicts critical properties, such as compressive strength and CO₂ emissions, which the deep RL agent then integrates into its state representation and reward formulation. The selection of this ANN is not arbitrary; it is substantiated by a comprehensive comparative analysis of 14 distinct machine learning algorithms, which demonstrated its superior predictive accuracy. Furthermore, the model’s exceptional performance, particularly for CO₂ emissions (R² ≈ 0.99), was rigorously verified using a 5-fold cross-validation procedure, ensuring its generalizability and establishing a reliable data-driven foundation for the RL agent.
This synergy enables data-driven, outcome-informed design decisions. By leveraging deep RL unique strengths, this study bridges a critical gap in automating and optimizing the sophisticated process of low-carbon RC beam design. The resulting framework, embodied by an intelligent agent trained within a custom-built environment, delivers a computationally efficient, data-driven solution with transformative potential for sustainable design practices in the construction sector. Its practical applicability is further demonstrated through a user-friendly web application, empowering users to generate tailored, low-carbon RC beam designs based on specific input parameters. The proposed framework is illustrated in Fig. 1.
Fig. 1.
The two-stage framework for intelligent low-carbon RC beam design.
Methodology
Stage 1 – deep neural networks
Dataset preparation
The dataset devoid of carbon emissions utilized for the deep learning model originates from the author’s prior publication54. Due to the lack of carbon emission values in the collected dataset, the environmental impact of the 886 concrete designs, quantified as embodied carbon, was evaluated using a standardized LCA framework. The analysis was conducted using SimaPro software, with background data sourced from the Ecoinvent database. The system boundary for this study was defined as “cradle-to-gate,” encompassing all upstream processes required to produce 1 m³ of RC. This scope includes the extraction and processing of raw materials, the manufacturing of all cementitious components and chemical admixtures, as well as the energy consumed during the concrete mixing process.
Processes beyond the plant gate, such as transportation to the construction site, the construction phase, the use phase, and end-of-life scenarios, were excluded. Transportation of all constituent materials to the batching plant is accounted for within this boundary by using Ecoinvent’s “market for” processes, which bundle production and average transport data. To handle the environmental burden of co-products, the APOS (Allocation at the Point of Substitution) system model was employed. For SCMs such as blast furnace slag and silica fume, datasets based on the “Cut-off” approach were specifically employed, which allocates no upstream burdens from their original industrial processes. All greenhouse gas emissions were aggregated into a single metric of kilograms of CO₂ equivalent (kg CO₂ eq) using 100-year global warming potentials from the Greenhouse Gas (GHG) Protocol.
The new dataset encompasses an extensive variety of mix designs featuring 12 variables: water, cement, slag, fly ash, silica Fume, fine aggregate, coarse aggregate, superplasticizer, density, compressive strength test time, compressive strength, and carbon emission. The input for this deep learning model comprises the initial 10 variables, whereas the output includes the last two variables: compressive strength and carbon emission. This dataset contains SCMs that affects concrete properties and reduces CO2 emissions. The statistical information of the collected dataset, including concrete mix designs, are given in Table 2.
Table 2.
Statistical analysis of concrete mixture properties and CO₂ Emissions.
| Count | Mean | Standard Deviation | Min | 25% | 50% | 75% | Max | ||
|---|---|---|---|---|---|---|---|---|---|
| Water (lb/ft 3 ) | 886 | 9.76 | 2.23 | 4.46 | 8.74 | 10.3 | 11.24 | 13.86 | |
| Cement (lb/ft 3 ) | 886 | 20.89 | 5.79 | 3.25 | 15.96 | 21.48 | 24.97 | 32.77 | |
| Slag (lb/ft 3 ) | 886 | 2.22 | 4.01 | 0 | 0 | 0 | 5.22 | 18.42 | |
| Fly Ash (lb/ft 3 ) | 886 | 1.12 | 2.57 | 0 | 0 | 0 | 0 | 13.48 | |
| Silica Fume (lb/ft 3 ) | 886 | 0.41 | 0.87 | 0 | 0 | 0 | 0 | 3.75 | |
| Fine Aggregate (lb/ft 3 ) | 886 | 49.12 | 9.76 | 14.67 | 44.15 | 48.46 | 52.13 | 71.79 | |
| Coarse Aggregate (lb/ft 3 ) | 886 | 54.34 | 16.76 | 15.92 | 43.2 | 59.87 | 66.19 | 81.47 | |
| Superplasticizer (lb/ft 3 ) | 886 | 0.1 | 0.12 | 0 | 0 | 0.06 | 0.2 | 0.64 | |
| Fresh Density (lb/ft 3 ) | 886 | 138.09 | 14.61 | 85.15 | 133.89 | 143.58 | 148.6 | 154.01 | |
| Compressive Strength Test Age (days) | 886 | 28.49 | 15.19 | 7 | 28 | 28 | 28 | 91 | |
| Compressive Strength (psi) | 886 | 6780.8 | 1628.27 | 2780.01 | 5691.84 | 6755.94 | 7939.92 | 11603.04 | |
| CO2 Emission (lb/ft3) | 886 | 21.68 | 5.18 | 4.82 | 16.96 | 22.16 | 25.26 | 31.93 | |
Figure 2 comprises a series of histograms, each depicting a numerical variable related to concrete mix design and CO2 emissions. The Y-axis represents the number of observations for each interval. Rectangular bars with elevated bars signify a greater concentration of data points for identifying outliers, and making decisions about data preprocessing. The overlaid density curve helps to interpret the histogram more effectively and highlights the areas where the data is most concentrated. Figure 3 visualizes a correlation matrix among concrete components, focusing on their environmental impacts through CO2 emissions. The matrix utilizes a color scale from dark blue (indicating a strong negative correlation) to light yellow (strong positive correlation), ranging from − 1 to 1.
Fig. 2.
Distribution of key variables in the concrete mix dataset.
Fig. 3.
Correlation matrix between different components of concrete mix.
Figure 4 shows pair plots that effectively present the relationship between CO2 emissions and various concrete mix components. Each plot displays kernel density estimates (KDEs) on the diagonals, representing each variable’s distribution. The off-diagonal plots depict bivariate relationships, visualized as two-dimensional KDE contours. The cement content plot demonstrates a robust, near-linear correlation, underscoring cement’s role as the primary component of CO2 emissions in concrete. This aligns with the known high carbon footprint of cement production.
Fig. 4.
Pair plots of concrete components and CO2 emission.
Carbon emissions calculation
Due to the absence of carbon emission values in the collected dataset, carbon emissions were computed independently for each of the 886 mixing designs. Then, the obtained carbon was examined separately according to each variable of the mix design. For example, Fig. 5 (a) shows the total amount of carbon produced, and the remaining figures (b) to (i) illustrate the carbon contribution associated with each individual mix design component.
Fig. 5.
Analysis of carbon emissions in 886 concrete mixing designs.
In the subplots of Fig. 5 components (Fig. 5b-i), the data points showing points with zero emissions correspond to concrete mix designs in which that specific material was absent. Therefore, these zero values represent the lack of a carbon contribution from that particular component, not a zero total emission for the entire mix. The carbon emissions statistical information obtained by SimaPro software is given in the last row of Table 2.
Implementation of deep neural networks
ANNs are computational systems inspired by biological neural networks such as those found in animals and consisting of layers and nodes that are interconnected in layers. These neurons transmit information and allow the network to learn from data and make predictions based on it. ANNs excel in applications such as pattern recognition, classification, and regression and are used in fields such as machine vision and natural language processing. The neural networks designed in this study predict the compressive strength and carbon emission as output and Take the mixing design as input. The predicted compressive strength and carbon emission from this stage will be sent to the next stage, the RL process. This has important implications for optimizing the mixing design to achieve desired efficiency and minimize environmental impact. The input consists of 10 variables, the middle layer consists of three hidden layers with 128, 64, and 32 neurons, respectively, and each hidden layer consists of Rectified Linear Unit (ReLU) activation functions with batch normalization and dropout. The inputs and outputs were scaled to prevent feature dominance with significant value and to improve stability.
To validate the selection of a robust and high-performing predictive model for the first stage of the proposed framework, a comprehensive comparative analysis of various machine learning algorithms was undertaken. The performance of 14 distinct models was evaluated for predicting concrete compressive strength, with the results summarized in the bar chart presented in Fig. 6.
Fig. 6.
Performance comparison of various machine learning models for predicting concrete compressive strength.
As illustrated, the selected ANN model, which achieved an R² score of approximately 0.85, demonstrated superior performance comparable to other leading machine learning models, including Light Gradient Boosting Machine (LightGBM) (R² = 0.848), XGBoost (R² = 0.834), and Random Forest (R² = 0.826). This rigorous evaluation suggests that an R² value of 0.85 constitutes a strong result for this prediction task, given the inherent variability and complex, non-linear relationships characteristic of real-world concrete mix data. Consequently, the ANN was confirmed as a suitable and effective model, establishing a reliable predictive foundation for the subsequent deep RL optimization stage.
Stage 2 – deep reinforcement learning
Deep RL is particularly well-suited for the automated design of RC beams, a problem characterized by a high-dimensional action space and complex, non-linear constraints derived from design codes. In this framework, an RL agent learns an optimal design policy through iterative interactions with a custom-built structural design environment.
At each step, the agent observes the current design represented by a state vector. This 26-variable state provides a comprehensive snapshot, including loading conditions, geometric properties, concrete mix components, and critical performance metrics calculated according to ACI 318 − 19 (as detailed under 'States (flexural design of RC beam based on ACI 318-19)')). Based on this state, the agent selects an action from a 13-dimensional space of material and geometric parameters that define the beam (detailed in the ‘Action space’ section). The environment then provides a numerical reward signal to guide the learning process. This reward function is custom-designed to penalize designs that violate ACI 318− 19 constraints while rewarding valid designs that minimize CO₂ emissions, thus steering the agent toward structurally sound and environmentally sustainable solutions (detailed in the ‘Reward functions’ section).
The agent’s objective is to learn a policy, π(a|s), that maximizes the cumulative discounted reward. This process enables the agent to autonomously navigate the complex trade-offs between structural integrity and environmental impact without requiring explicitly encoded rules for every design constraint. The loading is randomly selected based on the intervals in Table 3, while associated constants are enumerated in the same table.
Table 3.
Design variables and material Properties.
| Design Variable | Range |
|---|---|
| Dead Load (kip/ft) | 1.5∼4.5 |
| Live Load (kip/ft) | 1.5∼4.5 |
| Beam Length (ft) | 15∼30 |
| Rebar Strength (ksi) | 60 |
| γ c ( lb/ft 3 ) | 150 |
| γs (lb/ft3) | 490 |
States (flexural design of RC beam based on ACI 318 − 19)
States in this study correspond to the current condition of the reinforced beam design problem, encompassing loading conditions, geometric properties, material properties, and mix design parameters. There are a total of 26 states. At the beginning of each episode, the dead load (state0), live load (state1), and beam length (state2) are assigned randomly within the ranges specified in Table 3, and in each episode the model designs the beam in number of step size (equals to 10). During the initial step of the episode, the structural components and mix design are also randomly selected to allow the neural network to perform better exploration.
The state vector includes structural parameters such as width (state3), depth (state4), main rebar size (state5), main rebar quantity (state6), compressive strength (predicted in Stage 1, state7 , and carbon emissions (predicted in Stage 1, state8). It also encompasses mix design components, including cement (state15), water (state16), fly ash (state17), slag (state, silica fume (state19), fine aggregate (state20), coarse aggregate (state21), superplasticizer (state22), fresh density of the concrete mix (state23) and compressive strength test age (state24). Flexural design parameters, governed by ACI 318 − 19, are also integrated, as outlined below.
When a beam is subjected to bending moments, i.e., positive moment, compressive strains are produced at the top, and tensile strains are produced at the bottom of the beam. These strains lead to compressive and tensile stresses. Since concrete is strong in compression and weak in tension, tension reinforcement can be used to prevent this defect. This increases the tensile strength and improves the performance of the beam when subjected to bending. The depth of the rectangular compression block is obtained using Eq. (1), where c is the distance from the neutral axis to the outermost compression surface. The isometric view is shown in Fig. 7.
![]() |
1 |
![]() |
2 |
Fig. 7.
Equivalent Stress Blocks for Strength Design - Isometric View.
The three zones of net tensile strains are: The tension-controlled zone where the net tensile strain is
.
. The compression-controlled zone where the net tensile strain is
and the transition zone where the net tensile strain is
, resulting in
. The net tensile strain in the extreme layer of reinforcement,
, is determined by Eq. (3). In this equation, d represents the effective depth of the beam (distance from the extreme compression fiber to the centroid of the tension reinforcement), a is the depth of the equivalent rectangular stress block, and β₁ is the stress block factor which is a function of the concrete’s compressive strength, as defined in Eq. (2).
![]() |
3 |
The parameters for flexural design, the depth of the compressive stress block (a), the nominal flexural strength (Mn), the reinforcement ratio (ρ), and the minimum required steel ratio (ρmin), are obtained from Eq. (4) through (7), respectively. The variables in these fundamental equations are defined as follows: As is the total area of the tension reinforcement; fy is the specified yield strength of the reinforcement (60 ksi in this study); f’c is the specified compressive strength of the concrete; and b is the width of the beam’s compression face. These calculations are central to ensuring the beam’s structural capacity meets design requirements.
![]() |
4 |
![]() |
5 |
![]() |
6 |
![]() |
7 |
The primary criterion for strength design is that the design capacity must exceed the required strength, Eq. (8), where the design loads are multiplied by the load factors. Various loads can be considered, but according to this study, dead load (DL), live load (LL), and self-weight (Eq. (9)) of the beam, are considered, where Eq. (10) shows the combination of loads used. D indicates dead load, and L indicates live load. The moment due to these loads (excluding self-weight) is given by Eq. (11). The total design moment is obtained from Eq. (12). Also, load factors indicate greater safety in RC structures.
![]() |
8 |
![]() |
9 |
![]() |
10 |
![]() |
11 |
![]() |
12 |
This holistic representation enables the agent to understand the complex relationships between material selection (actions), geometric configuration (actions), and the resulting structural performance (states), thereby facilitating the learning of optimal, code-compliant, and low-carbon designs.
Agent
PPO55 is a RL algorithm designed for stability and sample efficiency. Addressing the challenges of policy gradient methods such as large policy updates and sufficient exploration while preserving stability, PPO tackles these issues by constraining and updating the policy and maintaining proximity to the previous policy through a clipping mechanism in its objective function. PPO employs Generalized Advantage Estimation (GAE) to calculate the advantage and balances the bias and variance trade-offs. Advantage reflects how good it is to act at in state st compared to the average action. The advantage value is obtained from Eq. (13), where
(the temporal difference) is
.
![]() |
13 |
PPO comprises two networks: actor and critic, where the actor network determines the action, and the critic network assesses the goodness of this action by estimating the value function. Dense layers are used to extract meaningful features in both the actor and critic networks. The specifications of the neural networks used in the critic and actor networks and their comparison are presented in Table 4.
Table 4.
Comparison of the actor and critic networks used in this study.
| Actor Network | Critic Network | |
|---|---|---|
| Goal | Chooses actions based on the current state | Estimates the value V(s) for a given state |
| Inputs | States | States |
| Outputs | Action | A single scalar value |
| Networks |
Dense (256, activation=’ReLU’) Batch Normalization Dense (128, activation=’ReLU’) Batch Normalization Mean: Dense (action dimension = 13, activation=’tanh’) |
Dense (256, activation=’ReLU’) Batch Normalization Dense (128, activation=’ReLU’) Batch Normalization Dense (1, activation=’linear’) |
| Activation function | Tanh for the mean output | Linear |
| Role of learning | Learns policy (state-to-action mapping) | Learns the state-value function |
The loss of the actor network is obtained from the Eq. (14), where
is the ratio of new-to-old policy probabilities, Ai is the advantage estimate,
is a hyperparameter, which is considered to be 0.2 in this study, controls the clipping range, β is the entropy coefficient (set to 0.03 in this study) and
is the entropy of the policy distribution. The clip function mitigates abrupt policy changes, and the min operator is to not encourage the model to make changes with a large distance from the previous policy.
![]() |
14 |
The loss of the critic network computed using Eq. (15), where
is Return estimate or discounted cumulative reward,
is Value function prediction for state
and N is batch size. Considering the on-policy PPO and GAE requirements, a buffer was used to store information such as state, action, reward, and probability values.
![]() |
15 |
The hyperparameters employed in the PPO algorithm are summarized in Table 5. They control how the algorithm behaves and significantly impacts its performance and convergence stability.
Table 5.
PPO agent hyperparameters.
| Hyperparameter | Value |
|---|---|
| Buffer Capacity | 2048 |
| Batch Size | 256 |
| Gamma | 0.99 |
| GAE Lambda | 0.95 |
| Clip Ratio | 0.2 |
| Actor (Policy) Learning Rate | 3e-4 |
| Critic (Value) Learning Rate | 1e-3 |
| Entropy Coefficient | 0.03 |
| Epochs of Updating | 10 |
| Step size | 10 |
The selection of the PPO agent’s hyperparameters, detailed in Table 5, was guided by a two-step approach designed to ensure both stability and performance. The initial values for core parameters such as Gamma, GAE Lambda, and the Clip Ratio were taken from foundational PPO research55, using configurations that have shown strong performance across various complex control tasks.
While other algorithms like the DDPG have been applied to RC beam design37, PPO was chosen for three key advantages in the context of this study. First, its on-policy nature and policy clipping mechanism ensure greater training stability, which is critical for navigating the high-dimensional state-action space without destructive updates. Second, its stochastic policy is inherently better suited for the problem’s mixed discrete-continuous action space (e.g., standard rebar sizes and variable material quantities), providing more natural exploration than the external noise required by deterministic policies. Lastly, PPO’s recognized robustness and reduced sensitivity to hyperparameter tuning made it a more practical choice, allowing the research to focus on the sustainable design objectives.
Action space
Table 6 shows the action space implemented in this study, which includes both structural design parameters and concrete mix components. The action space comprises 13 distinct parameters that define the complete design specifications of the RC beam. The ranges for these parameters were determined to ensure the agent explores a space of realistic and practical designs. The concrete mix design section incorporates an action space with eight components represented in pounds per cubic foot. The limits of the mix design components were selected based on the collected dataset. The upper and lower limits of each parameter are based on the collected dataset shown in Table 2. These ranges were carefully selected to ensure that the concrete mix remains practical and constructible. The comprehensiveness of this action space allows for the exploration of a wide range of designs while maintaining practical constraints.
Table 6.
Action space parameters.
| Parameter | Range |
|---|---|
| Width (in) | 10∼20 |
| Depth (in) | 10∼50 |
| Main Rebar Size | #6,7,8,9,10 |
| Number of Main Rebar | 2∼6 |
| Water (lb/ft 3 ) | 4∼14 |
| Cement (lb/ft 3 ) | 3.5∼33 |
| Slag (lb/ft 3 ) | 5∼19 |
| Fly Ash (lb/ft 3 ) | 2∼14 |
| Silica Fume (lb/ft 3 ) | 0.8∼3.8 |
| Fine Aggregate (lb/ft 3 ) | 20∼72 |
| Coarse Aggregate (lb/ft 3 ) | 15∼81 |
| Superplasticizer (lb/ft 3 ) | 0.03∼0.65 |
| Compressive Strength Test Age (days) | 7, 28, 60, 90 |
Reward functions
In RL, the reward function is a core component that guides the agent through the learning process by delivering feedback. The purpose of the reward function is to empower the agent to identify the action and state that leads to the desired outcomes. The feedback assists the agent in exploring the environment and discovering strategies that yield higher rewards. Over time, the agent learns to prioritize actions that enhance its likelihood of getting maximum reward based on the rewards it receives.
Negative reward function
Negative reward functions, often referred to as penalties, constitute a foundational element of RL algorithms and guide the learning agent by signaling undesirable behaviors or states and steering the agent to seek alternative actions that lead to better outcomes. Negative rewards actively deter the agent from repeating mistakes or violating constraints, and the agent learns to avoid those situations and concentrate on optimal trajectories. Equation (16) to (20) represent the negative reward functions defined in this study, which constrain the agent to perform the bending design correctly and ensure adherence to code-compliant criteria. Equation (21) to (23) constrain the agent to produce a feasible mix design.
![]() |
16 |
![]() |
17 |
![]() |
18 |
![]() |
19 |
![]() |
20 |
![]() |
21 |
![]() |
22 |
![]() |
23 |
Positive reward function
Equation (24) formulates and calculates the amount of positive reward in a RL environment for RC beam design. Positive reward is only applied when negative reward criteria are fulfilled: The sum of the negative reward components must be zero, meaning all design constraints are met. This is crucial because it ensures that the agent only rewards the design that structurally and geometrically satisfies all specified requirements, leading to integrating environmental sustainability with practical engineering thresholds. This reward function incentivizes the RL agent to find structurally efficient and environmentally optimal solutions.
![]() |
24 |
This relationship illustrates a negative exponential function, highlighting that by increasing the amount of CO2 emission, the amount of positive reward diminishes exponentially. This equation aims to increase the agent’s incentive to emit less CO2 emission while meeting all design criteria. The scaling factor, α, and the decay rate, β, are critical hyperparameters that shape the agent’s behavior.
To further validate the design of this reward function and demonstrate the robustness of the learned policy, a comprehensive sensitivity analysis was conducted on these key parameters. The agent was trained under nine distinct reward conditions, with the scaling factor (α) selected from {1, 2, 4} and the decay rate (β) from {0.015, 0.03, 0.06}. The results of this analysis, including the reward curves for each parameter combination and a summary of performance metrics, are presented in Fig. 8; Table 7, respectively.
Fig. 8.
Sensitivity Analysis of Reward Function Parameters.
Table 7.
Performance metrics for reward function sensitivity Analysis.
| α | β | Avg Reward (Last 500 Episodes) |
Avg CO2 (Last 500 Episodes of Successful Designs) |
|---|---|---|---|
| 1 | 0.015 | 0.3486 | 11.75 |
| 1 | 0.03 | 0.2565 | 9.71 |
| 1 | 0.06 | 0.3425 | 8.53 |
| 2 | 0.015 | 0.8846 | 14.5 |
| 2 | 0.03 | 0.6365 | 13.96 |
| 2 | 0.06 | 0.6269 | 9.5 |
| 4 | 0.015 | 1.8607 | 13.21 |
| 4 | 0.03 | 1.5896 | 11.1 |
| 4 | 0.06 | 1.2294 | 8.86 |
The analysis reveals that the scaling factor, α primarily adjusts the magnitude of the final reward without significantly altering the fundamental learning trajectory. In contrast, the decay rate, β, plays a more critical role in guiding the agent’s optimization strategy and highlights a crucial trade-off between optimization aggressiveness and learning stability. A low decay rate (e.g., β = 0.015) results in stable learning but yields designs with higher CO₂ emissions, as the incentive for reduction is weaker. Conversely, a high decay rate (e.g., β = 0.06) aggressively pushes the agent towards lower CO₂ solutions but can introduce volatility into the training process, as evidenced by the more jagged learning curves in Fig. 8.
Based on this analysis, the parameters α = 2 and β = 0.06 were selected for the final model presented in this study. This choice prioritizes the primary objective of minimizing carbon emissions, achieving significantly lower CO₂ values than less aggressive decay rates, while maintaining an acceptable level of training stability. This investigation confirms that the learning framework is robust and that the agent’s policy can be effectively tuned by adjusting the reward function’s parameters.
Results and discussions
Deep neural network results (stage 1)
Figure 9 depicts the loss Function graph during Stage 1 training, demonstrating the performance of the model for the input data and the goal of training is to minimize the loss function. The X-axis corresponds to the epoch count, where each epoch represents a complete iteration over the training dataset. The Y-axis displays the value of the loss function, which represents the difference between the model predictions and ground-truth values. Both the training and validation factors have a rapid deceleration in the initial epoch, which is a natural process and indicates that the model is learning the basic patterns quickly. Eventually, the model converges, indicating improved stability and robustness of its performance. An early stopping was implemented to mitigate overfitting so that training stops after a certain number of epochs.
Fig. 9.
Loss function plot for training and validation.
Figure 10 illustrates the mean absolute error progression during training. This plot quantifies the magnitude of the difference between the predicted and actual values. The lower the mean absolute error, the higher the prediction accuracy of the model. The mean absolute error value is calculated from Eq. (25).
Fig. 10.
Mean absolute error plot.
![]() |
25 |
Figure 11 depicts the mean squared error (MSE) of the model. MSE is a common metric for evaluating the accuracy of a model and measures the average squared deviation between the predicted values and the actual values. The mathematical formulation of MSE is given in Eq. (26).
![]() |
26 |
Fig. 11.
MSE plot.
Figure 12 illustrates the root mean squared error (RMSE). The Y-axis of the figure represents the standard deviation of the forecast errors, with a lower value indicating better prediction accuracy. The RMSE is defined in Eq. (27).
![]() |
27 |
Fig. 12.
RMSE plot.
Figure 13 (a) illustrates the performance of the model in predicting compressive strength by comparing the predicted compressive strength with the actual value. The diagonal line corresponds to the ideal line, where the model estimate perfectly aligns with the actual value. There are no data points that deviate significantly from the trendline, suggesting minimal outliers. Similarly, Fig. 13 (b)demonstrates the performance of the model in estimating CO2 emissions. The plot reveals a robust linear correlation between the actual and predicted values, where the points follow the ideal line almost exactly, underscoring the effective learning of the model for estimating CO2 emissions.
Fig. 13.
Predicted vs. actual value plot for: (a) compressive strength; (b) carbon emission.
While the initial train-validation split demonstrated strong performance, a more rigorous validation was conducted to ensure the generalizability of the models and to substantiate the exceptionally high R² value (≈ 0.99) observed for CO₂ emission prediction. To this end, a 5-fold cross-validation procedure was implemented. The entire dataset was partitioned into five folds, and for each iteration, a model was trained on four folds and evaluated on the held-out fifth fold. This process was repeated until every fold had served as the evaluation set. Each model was a fresh instance of the neural network architecture described previously, which incorporates Batch Normalization, Dropout, and L2 regularization to promote robustness.
The results of the 5-fold cross-validation for the CO₂ emission model, detailed in Fig. 14, demonstrated remarkable consistency. The model yielded a mean R² score of 0.9911 ± 0.0017, complemented by a low mean RMSE of 0.4853 ± 0.0584 and a mean Mean Absolute Error of 0.3835 ± 0.0593. The extremely low standard deviations across all metrics confirm that the high predictive accuracy is not an artifact of a specific data partition but is a robust feature of the model.
Fig. 14.
Results of the 5-Fold Cross-Validation Analysis.
Deep reinforcement learning results (stage 2)
Figure 15 shows the rewards function plot that the agent receives during the training process. The plot shows an upward reward trend, indicating that the agent is learning and improving its performance over time. In the early parts, the rewards are primarily negative, indicating that the agent is exploring the environment and making suboptimal decisions. As training progresses, the rewards become significantly positive. This suggests that the agent is successfully learning the optimal policy and is excellent at minimizing CO2 emissions and satisfying the structural design constraints.
Fig. 15.
Evolution of agent reward during deep RL training.
Figure 16 depicts carbon emissions from selected actions (both structural and concrete components). As illustrated, initially, due to the exploration and the randomness of the action selection, the carbon emissions are represented as a dispersed cloud of scattered points, each point corresponding to a generated design by an agent. The agent learns to produce designs with lower carbon emissions through iterative training over thousands of steps. Figure 17 depicts the exploration of structural design parameters within the action space in a RL process. These patterns reveal that the RL agent systematically considers different structural configurations for each parameter and it is not greedy for any component.
Fig. 16.
CO2 emission reduction based on agent decision during the training process.
Fig. 17.
Action space exploration values of geometric components during training.
Figure 18 maps the exploration of different concrete mix components in action space, individual subplots quantifying the amount of a particular component (cement, water, fly ash, slag, silica fume, fine aggregate, coarse aggregate, and superplasticizer) in pounds per cubic foot (lb/ft³) along the Y-axis. This indicates that the model is not greedy when choosing the action space.
Fig. 18.
Action space exploration values of mixing design during training.
Figure 19 illustrates a 3D RC beam featuring an optimal design of the trained agent, which includes the geometric design and the mixing design under the specified input loading conditions.
Fig. 19.
3D illustration of a RC beam with predicted structural and mix design parameters by the trained agent.
To validate the stability and reproducibility of the deep RL training process, the agent was trained three separate times using different random seeds (21, 123, and 999). This ensures that the observed performance is not an artifact of a specific random initialization but rather a consistent outcome of the learning framework.
Figure 20 presents the learning curves for these three independent runs. While the inherent stochasticity of the agent’s exploration leads to some variance in the reward trajectory between runs, all three training sessions exhibit a clear and consistent convergence pattern. Each run begins with negative rewards, indicative of the initial exploration phase where constraints are often violated, and steadily progresses towards a stable region of high positive rewards.
Fig. 20.
Training Convergence and Reproducibility Analysis.
The convergence of all three runs to a similar final reward level of approximately 1.5 demonstrates the robustness of the PPO agent and the reward structure. This confirms that the agent can reliably learn an effective policy for designing low-carbon, code-compliant RC beams, providing confidence in the generalizability of the trained model.
Building a web application for intelligent decision-making RC beam design
A notable outcome of this research lies in the development of a web application using Flask as the backend. The software serves as a practical interface for users to interact with the trained deep RL model and generate optimized RC beam designs. As demonstrated in Fig. 21, the web application enables the user to enter input parameters such as beam length (L), DL, and LL. The user can retrieve the best designs by inputting DL, LL, L based on prior successful designs.
Fig. 21.
User input parameters for the web application, accessible at https://MaterialAI.ir/RL.
Upon selecting the “Run” button, the prediction module employs nearest neighbor search to identify and rank the designs generated by RL that are most similar to the user’s input data. Due to that, all successful designs are stored in an Excel file during training, containing the data for DL, LL, L, and the structural and mixing design. To ensure a balanced representation of DL, LL, and L, normalization is applied using Eq. (28) (where
represents L, DL, and LL). It mitigates the disproportionate influence of values with large numbers, such as beam length.
![]() |
28 |
A K-Dimensional tree (KD tree), a binary space partitioning data structure, is constructed from the DL, LL, and L values of successful designs. It enables efficient nearest neighbor search. Then, the KD tree searches the ten designs closest to the input design in the normalized space based on Euclidean distance and ranks them by reward (higher rewards correlate with lower carbon emissions). Finally, the first three designs with the lowest carbon footprints are presented to the user. This interactive workflow process bridges the gap between advanced computational methods and real-world structural engineering workflows, making advanced optimization techniques accessible to broader engineering communities. Additionally, the web application is accessible at https://MaterialAI.ir/RL.
Figure 22 displays the output generated by the web application for a specific set of user inputs. The results are structured in a clear and organized manner and categorized into input parameters, beam dimensions, material properties and CO₂ emissions, design parameters, concrete mix design, and Performance. It is noteworthy that the application provides optimized values for beam depth and width, rebar size, and number of rebars. In addition, it gives a detailed concrete mix design that specifies the amounts of water, cement, slag, fly ash, silica Fume, fine and coarse aggregates, and superplasticizers. These optimized parameters reflect the decision-making process of the deep RL agent, balancing competing goals of structural performance, following ACI 318− 19 standards, and environmental considerations, particularly minimizing CO2 emissions.
Fig. 22.
The output generated by the web application includes the complete design of the concrete beam.
Web application evaluation
To address the practical utility of the web application, a scenario-based demonstration and performance assessment were presented. Consider a design engineer tasked with designing a beam with a length of 19.00 ft, a dead load of 3.70 kip/ft, and a Live load of 2.80 kip/ft. The user inputs these values into the web application’s interface. Upon execution, the system’s backend employs a KD tree search to identify the ten most similar, successful designs from the pre-computed database of RL-generated solutions.
Table 8 presents the detailed output for the first seven top candidates retrieved by the system. This includes the input parameters of the retrieved designs, their geometric and material properties, and key performance indicators. The designs are ranked primarily by their Euclidean distance to the user’s input and secondarily by the reward value achieved during the RL training, which is correlated with lower CO₂ emissions.
Table 8.
Detailed design candidates for a sample input Scenario.
| Design Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| d (in) | 50.0 | 50.0 | 47.5 | 50.0 | 50.0 | 50.0 | 47.2 |
| b (in) | 10.0 | 12.6 | 10.0 | 10.9 | 10.0 | 10.6 | 13.9 |
| Main Rebar Size (#) | 9.00 | 10.0 | 10.0 | 10.0 | 9.00 | 10.0 | 9.00 |
| No. of Main Rebars | 5.00 | 5.00 | 6.00 | 6.00 | 5.00 | 4.00 | 5.00 |
| f’c (psi) | 4570.6 | 5046.5 | 6833.9 | 6035.6 | 4000.9 | 3438.3 | 4604.9 |
| φMn (kip-ft) | 1037.5 | 1324.9 | 1489.6 | 1570.2 | 1025.1 | 1028.7 | 999.4 |
| Mu (kip-ft) | 397.1 | 430.8 | 422.4 | 426.2 | 397.1 | 425.3 | 405.2 |
| εt | 0.0130 | 0.0140 | 0.0099 | 0.0107 | 0.0115 | 0.0100 | 0.0181 |
| ρmin | 0.0034 | 0.0036 | 0.0041 | 0.0039 | 0.0033 | 0.0033 | 0.0034 |
| CO₂ Emissions (lb/ft³) | 6.20 | 6.35 | 6.50 | 7.76 | 8.84 | 10.0 | 11.6 |
| Reward | 1.66 | 1.65 | 1.65 | 1.58 | 1.53 | 1.48 | 1.41 |
The results in Table 8 demonstrate the tool’s predictive utility. First, all seven candidates satisfy the primary structural design criterion, with the Design Moment Capacity (ΦMn) safely exceeding the Required Moment (Mu). This confirms the tool’s reliability in generating code-compliant designs.
Second, the ranking system allows for a clear trade-off analysis. For instance, Design 1 offers the lowest CO₂ emissions (6.20 lb/ft³) and the highest reward (1.66). However, an engineer might also evaluate Design 3, which has comparable proximity to the input but provides a 50% increase in concrete compressive strength for only a 5% increase in CO₂ emissions. This ability to explore a ranked solution space, rather than being given a single opaque answer, is a significant feature that enhances the tool’s real-world impact.
Comparative benchmarking and performance validation
To empirically evaluate the performance of the proposed PPO-based framework, a comparative analysis was conducted against the A2C algorithm56, a well-established on-policy deep RL method. While other algorithms like DDPG are common, DDPG’s deterministic policy is designed for continuous action spaces. It is inherently incompatible with the mixed discrete-continuous action space of this problem (which includes discrete choices for rebar sizes and test ages). A2C’s stochastic policy, however, naturally accommodates this mixed action space, making it a suitable and rigorous benchmark.
For a fair and direct comparison, both the PPO and A2C agents were implemented with identical neural network architectures and trained within the same custom environment. The evaluation was performed across nine distinct test cases, with varying beam lengths, dead loads, and live loads to cover a representative range of practical design scenarios.
The comparative results, presented in Table 9, conclusively demonstrate the superior performance of the PPO algorithm. Across all nine test cases, the PPO agent achieved a higher final reward, indicating a better ability to find solutions that satisfy all design constraints while minimizing the environmental objective. Most notably, the PPO agent consistently generated designs with significantly lower embodied carbon. The resulting CO₂ emissions from PPO-optimized designs were between 43.35% and 75.04% lower than those produced by the A2C agent. This substantial improvement highlights the effectiveness of PPO’s policy clipping mechanism and advanced exploration strategies in navigating the complex optimization landscape of sustainable structural design, validating its selection for this framework.
Table 9.
Performance comparison between A2C and PPO algorithms for RC beam design optimization across nine test cases.
| Case | Input L (ft) |
Input DL (kip/ft) |
Input LL (kip/ft) |
A2C Reward |
PPO Reward |
Better Reward |
A2C vs. PPO Reward (%) |
A2C CO2
(lb/ft³) |
PPO CO2 (lb/ft³) |
A2C vs. PPO CO2 (%) |
Lower CO2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 24.6 | 1.58 | 2.33 | 1.50 | 1.67 | PPO | 10.06 | 9.59 | 6.06 | 58.34 | PPO |
| 2 | 28.4 | 1.76 | 2.77 | 1.46 | 1.66 | PPO | 11.91 | 10.50 | 6.27 | 67.36 | PPO |
| 3 | 15.4 | 2.16 | 3.02 | 1.50 | 1.66 | PPO | 9.85 | 9.65 | 6.20 | 55.76 | PPO |
| 4 | 15.4 | 2.1 | 3.45 | 1.47 | 1.67 | PPO | 12.24 | 10.36 | 6.01 | 72.45 | PPO |
| 5 | 23.2 | 2.16 | 3.27 | 1.53 | 1.69 | PPO | 9.57 | 8.95 | 5.60 | 59.83 | PPO |
| 6 | 27.1 | 1.52 | 3.92 | 1.51 | 1.65 | PPO | 8.44 | 9.37 | 6.43 | 45.74 | PPO |
| 7 | 25.5 | 2.52 | 1.97 | 1.57 | 1.69 | PPO | 7.12 | 8.14 | 5.68 | 43.35 | PPO |
| 8 | 29.4 | 2.51 | 1.78 | 1.55 | 1.68 | PPO | 7.91 | 8.56 | 5.82 | 47.24 | PPO |
| 9 | 16.5 | 4.04 | 3.31 | 1.40 | 1.63 | PPO | 14.1 | 11.82 | 6.75 | 75.04 | PPO |
Concluding remarks
This research establishes a paradigm shift in sustainable structural design by demonstrating how deep RL can effectively navigate the complex, multi-objective optimization landscape of RC beam design. The integration of ANN with deep RL creates a synergistic framework that addresses the fundamental challenge of balancing structural performance with environmental sustainability in construction materials.
A comprehensive comparative analysis of 14 distinct machine learning algorithms was conducted to identify the optimal predictive model, with the ANN demonstrating superior performance comparable to leading algorithms, including LightGBM (R² = 0.848), XGBoost (R² = 0.834), and Random Forest (R² = 0.826). The two-stage framework addresses critical limitations in existing optimization approaches through several key innovations:
The selected ANN model achieved exceptional predictive accuracy for CO₂ emissions (R² ≈ 0.99) and compressive strength (R² ≈ 0.85), with training convergence after 526 epochs resulting in a loss Function value of 0.15 and mean absolute error of 0.38. 5-fold cross-validation confirmed robust performance with mean R² = 0.9911 ± 0.0017, RMSE of 0.4853 ± 0.0584, and Mean Absolute Error of 0.3835 ± 0.0593.
The deep RL framework successfully navigated a 13-dimensional action space with both discrete and continuous parameters, coupled with a 26-parameter state space. A detailed convergence analysis across multiple random seeds demonstrated a clear and reproducible learning progression, with rewards transitioning from negative values to stabilizing at approximately 1.5, indicating successful policy convergence.
The PPO agent learned to satisfy ACI 318 − 19 design constraints implicitly through environmental interaction, demonstrating superior adaptability compared to explicit constraint-handling methods while exploring innovative code-compliant design solutions.
The framework achieved substantial environmental benefits. A comparative benchmark against the A2C algorithm revealed that the PPO agent generated designs with 43.35–75.04% lower CO₂ emissions, confirming its superior optimization performance while maintaining structural integrity.
The customized reward function design, whose effectiveness was verified through a comprehensive reward sensitivity analysis, effectively balances competing objectives through penalty-based constraint enforcement and sustainability-driven positive rewards. The web-based implementation employs KD tree algorithms for nearest neighbor search, providing ranked design alternatives that bridge research outcomes with practical engineering applications.
This work establishes deep RL as a viable paradigm for automated structural design optimization. The framework’s ability to learn complex constraint relationships while optimizing for sustainability metrics demonstrates potential for broader infrastructure design applications. The open-source code availability and web-based interface (accessible at https://MaterialAI.ir/RL) facilitate adoption by the research community and engineers.
In conclusion, this work establishes deep RL as a robust and viable methodology for automated, sustainable structural design. The framework’s ability to learn complex relationships and optimize for multiple competing objectives offers a computationally efficient and scalable solution. This represents a significant advancement toward the development of intelligent, environmentally conscious tools that can support the construction industry’s transition to a more sustainable future.
Research limitations and future directions
Limitations of the current study
The scope and applicability of the current framework are defined by several key limitations that also provide clear avenues for future research.
First, regarding the scope of analysis and generalizability, the technical analysis is confined exclusively to the flexural design of simply supported RC beams. Consequently, other critical design aspects mandated by ACI 318 − 19, such as shear design, torsional resistance, and serviceability limit states (e.g., long-term deflection and crack width control), were not integrated into the deep RL agent’s state representation or reward structure.
Second, the training process involves a significant computational cost. While the trained agent can generate optimized designs almost instantaneously, the initial training of the deep RL model is resource-intensive. It requires a substantial number of interactions with the environment and careful hyperparameter tuning to ensure convergence to an optimal policy. This upfront investment of computational resources, while typical for deep RL applications, is a practical consideration for developing and retraining models for different design scenarios.
Finally, the framework is architected for a multi-objective optimization that prioritizes CO₂ emissions and is hard-coded to the ACI 318 − 19 design standard. This focus does not capture the full, multi-faceted trade-offs of real-world projects (e.g., cost, schedule). It inherently limits the framework’s direct use in regions adhering to other prominent standards, such as Eurocode.
Future research directions
This study establishes a methodology for RC beam optimization. Future work can extend this framework in several key areas:
System-Level Optimization: Extend the framework to other critical structural elements such as columns, slabs, and entire building frames. This will likely involve transitioning to a multi-agent RL approach, where individual agents collaborate to design an optimized, interactive structural system.
Comprehensive Design Verification: Incorporate a broader range of design checks beyond flexure, including shear and torsion design, serviceability limits (e.g., deflection and crack width control), and the detailing requirements for seismic-resistant structures, to create a more holistic and practical design tool.
Adaptation to Novel Materials and Structural Types: Adapt the methodology to optimize other structural forms, such as pre-stressed concrete members, steel-concrete composite beams, or structures employing emerging materials like ultra-high performance concrete and timber-concrete composites.
BIM Integration for Automated Workflows: Develop a direct plugin for BIM software. This would enable the agent to read inputs from a digital model, perform the optimization, and write the optimized data back into the BIM environment, seamlessly integrating the tool into modern design workflows.
Multi-Hazard Resilience and Life-Cycle Optimization: Expand the optimization objectives to include performance under various hazards, such as seismic loading, wind, and fire. Furthermore, the reward function could be broadened to encompass the whole building life cycle, including operational energy, maintenance costs, and end-of-life deconstruction.
Enhancing Computational Efficiency: Investigate more sample-efficient RL algorithms (e.g., model-based RL) or transfer learning techniques to reduce the computational cost and time required for training, making the approach more scalable and accessible.
Author contributions
Alireza Hosseinzadeh: Formal Analysis, Software, Methodology, Conceptualization, Investigation, Writing-Original Draft, Data Curation, Visualization. Mehdi Dehestani: Conceptualization, Methodology, Supervision, Project Administration, Investigation, Writing - Review & Editing.
Data availability
Code Availability: The source code for this study has been made publicly available on GitHub and can be accessed at https://github.com/alirezahzz/Low-Carbon-RC-Beam-Deep-Reinforcement-Learning.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Zhang, Y. et al. Predicting the compressive strength of high-performance concrete using an interpretable machine learning model. Sci. Rep.14 (1), 1–18. 10.1038/s41598-024-79502-z (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hosseinzadeh, M., Dehestani, M. & Hosseinzadeh, A. Prediction of mechanical properties of recycled aggregate fly Ash concrete employing machine learning algorithms. J. Build. Eng.76, 107006. 10.1016/j.jobe.2023.107006 (2023). March. [Google Scholar]
- 3.Manan, A. et al. Sustainable optimization of concrete strength properties using artificial neural networks: a focus on mechanical performance. Mater. Res. Express. 12 (2). 10.1088/2053-1591/adb003 (2025).
- 4.Ghoniem, A. G., Aboul, L. A. & Nour Mechanical properties prediction of sandstone concrete with varying compaction levels and silica fume ratios using machine-learning approaches. Constr. Build. Mater.460, no.10.1016/j.conbuildmat.2024.139817 (January, 2025).
- 5.Vadivel, T. S., Suseelan, A., Karthick, K., Safran, M. & Alfarhood, S. Experimental investigation and machine learning prediction of mechanical properties of rubberized concrete for sustainable construction. Sci. Rep.14 (1), 22725. 10.1038/s41598-024-73504-7 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Manan, A., Pu, Z., Ahmad, J. & Umar, M. Multi-targeted strength properties of recycled aggregate concrete through a machine learning approach. Eng. Comput. (Swansea Wales). 42 (1), 388–430. 10.1108/EC-07-2024-0635 (2025). [Google Scholar]
- 7.Hosseinzadeh, M., Mousavi, S. S., Hosseinzadeh, A. & Dehestani, M. An efficient machine learning approach for predicting concrete chloride resistance using a comprehensive dataset. Sci. Rep.13 (1). 10.1038/s41598-023-42270-3 (Dec. 2023). [DOI] [PMC free article] [PubMed]
- 8.Yang, Y., Chen, H., Peng, J. & Dong, Y. Machine learning-based probabilistic prediction model for chloride concentration in the interfacial zone of precast and cast-in-place concrete structures. Structures72, 108224. 10.1016/j.istruc.2025.108224 (2025). [Google Scholar]
- 9.Shaaban, M., Amin, M., Selim, S. & Riad, I. M. Machine learning approaches for forecasting compressive strength of high-strength concrete. Sci. Rep.15 (1), 1–12. 10.1038/s41598-025-10342-1 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yu, F., Chu, W., Zhang, R., Gao, Z. & Yang, Y. Predicting the permeability and compressive strength of pervious concrete using a stacking ensemble machine learning approach. Sci. Rep.15 (1), 1–14. 10.1038/s41598-025-08479-0 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Onyelowe, K. C. et al. Developing machine learning frameworks to predict mechanical properties of ultra-high performance concrete mixed with various industrial byproducts, vol. 15, no. 1. (2025). 10.1038/s41598-025-08780-y [DOI] [PMC free article] [PubMed]
- 12.Khan, A. et al. Enhancing concrete strength for sustainability using a machine learning approach to improve mechanical performance. Sci. Rep.15 (1), 1–20. 10.1038/s41598-025-02648-x (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li, Y., Liu, Y., Lin, H. & Jin, C. Study of flexural strength of concrete containing mineral admixtures based on machine learning. Sci. Rep.13 (1), 1–15. 10.1038/s41598-023-45522-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nguyen, T. H. et al. Optimizing flexural strength of RC beams with recycled aggregates and CFRP using machine learning models. Sci. Rep.14 (1), 1–25. 10.1038/s41598-024-79287-1 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Manan, A., Zhang, P., Ahmad, S. & Ahmad, J. Prediction of flexural strength in FRP bar reinforced concrete beams through a machine learning approach. Anti-Corrosion Methods Mater.71 (5), 562–579. 10.1108/ACMM-12-2023-2935 (2024). [Google Scholar]
- 16.Feng, D. C., Wang, W. J., Mangalathu, S., Hu, G. & Wu, T. Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements, Eng. Struct., vol. 235, no. August p. 111979, 2021, (2020). 10.1016/j.engstruct.2021.111979
- 17.Benzaamia, A., Ghrici, M., Rebouh, R., Pilakoutas, K. & Asteris, P. G. Predicting the compressive strength of CFRP-confined concrete using deep learning. Eng. Struct.319, no.10.1016/j.engstruct.2024.118801 (May, 2024).
- 18.Elshaarawy, M. K., Alsaadawi, M. M. & Hamed, A. K. Machine learning and interactive GUI for concrete compressive strength prediction. Sci. Rep.14 (1), 1–26. 10.1038/s41598-024-66957-3 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li, Y., Ma, Y., Tan, K. H., Qian, H. & Liu, T. Microstructure-informed deep learning model for accurate prediction of multiple concrete properties, J. Build. Eng., vol. 98, no. September, pp. 1–15, (2024). 10.1016/j.jobe.2024.111339
- 20.Ta, Q. T., Mac, V. H., Huh, J., Yim, H. J. & Lee, G. Automatic detection of subsurface defects in concrete structures using state-of-the-art deep learning-based object detectors on the infrared dataset, Eng. Struct., vol. 329, no. December p. 119829, 2025, (2024). 10.1016/j.engstruct.2025.119829
- 21.Luo, D., Qiao, X. & Niu, D. A predictive model for the freeze-thaw concrete durability index utilizing the deeplabv3 + model with machine learning, Constr. Build. Mater., vol. 459, no. May p. 139788, 2025, (2024). 10.1016/j.conbuildmat.2024.139788
- 22.Shokrnia, H., KhodabandehLou, A., Hamidi, P. & Ashrafzadeh, F. Prediction of compressive strength of fiber-reinforced concrete containing silica (SiO2) based on metaheuristic optimization algorithms and machine learning techniques, (2025). 10.1038/s41598-025-05146-2 [DOI] [PMC free article] [PubMed]
- 23.Guo, P., Meng, W. & Bao, Y. Knowledge-guided data-driven design of ultra-high-performance geopolymer (UHPG). Cem. Concr Compos.153, 105723. 10.1016/j.cemconcomp.2024.105723 (2024). July. [Google Scholar]
- 24.Guo, P., Jiang, Z., Meng, W. & Bao, Y. Multi-agent collaboration for knowledge-guided data-driven design of ultra-high-performance concrete (UHPC) incorporating solid wastes. Cem. Concr Compos.164, 106230. 10.1016/j.cemconcomp.2025.106230 (2025). [Google Scholar]
- 25.Wudil, Y. S., Al-Fakih, A., Al-Osta, M. A. & Gondal, M. A. Intelligent optimization for modeling carbon dioxide footprint in fly Ash geopolymer concrete: A novel approach for minimizing CO2 emissions. J. Environ. Chem. Eng.12 (1), 111835. 10.1016/j.jece.2023.111835 (2024). [Google Scholar]
- 26.Al-Fakih, A., Al-wajih, E., Saleh, R. A. A. & Muhit, I. B. Ensemble machine learning models for predicting the CO2 footprint of GGBFS-based geopolymer concrete. J. Clean. Prod.472, no.10.1016/j.jclepro.2024.143463 (March, 2024).
- 27.Manan, A. et al. Machine learning prediction of recycled concrete powder with experimental validation and life cycle assessment study. Case Stud. Constr. Mater.21, e04053. 10.1016/j.cscm.2024.e04053 (2024). [Google Scholar]
- 28.Manan, A. et al. Environmental and human health impact of recycle concrete powder: an emergy-based LCA approach, Front. Environ. Sci., vol. 12, no. January, pp. 1–20, (2024). 10.3389/fenvs.2024.1505312
- 29.Ren, Q. et al. Optimizing mix design of concrete with manufactured sand for low embodied carbon and desired strength using machine learning. Constr. Build. Mater.457, no.10.1016/j.conbuildmat.2024.139407 (May, 2024).
- 30.Ben Ghorbal, A., Grine, A., Elbatal, I., Almetwally, E. M. & Eid, M. M. Predicting carbon dioxide emissions using deep learning and ninja metaheuristic optimization algorithm. Sci. Rep.15, 1–28. 10.1038/s41598-025-86251-0 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Golafshani, E. M., Behnood, A., Kim, T., Ngo, T. & Kashani, A. A framework for low-carbon mix design of recycled aggregate concrete with supplementary cementitious materials using machine learning and optimization algorithms. Structures61, 106143. 10.1016/j.istruc.2024.106143 (2024). [Google Scholar]
- 32.Manan, A., Zhang, P., Majdi, A., Alattyih, W. & Ahmad, J. Utilizing waste materials in concrete: a review on mechanical and sustainable performance. Green. Mater. 1–18. 10.1680/jgrma.24.00122 (2025).
- 33.Cao, Y. et al. Enhancing mix proportion design of low carbon concrete for shield segment using a combination of bayesian optimization-NGBoost and NSGA-III algorithm. J. Clean. Prod.465, 142746. 10.1016/j.jclepro.2024.142746 (2024). [Google Scholar]
- 34.Zhang, R., Wang, W. & Alam, M. S. Deep learning-aided optimization framework for hybrid braced structures to support life-cycle cost-based design, Eng. Struct., vol. 327, no. December p. 119603, 2025, (2024). 10.1016/j.engstruct.2024.119603
- 35.Suebsuk, J. et al. Concrete mix design: optimizing recycled asphalt pavement in Portland cement concrete. Constr. Build. Mater.455, 139180. 10.1016/j.conbuildmat.2024.139180 (2024). [Google Scholar]
- 36.Li, W. et al. Optimized mix design method of ultra-high performance concrete (UHPC) and effect of high steel fiber content: mechanical performance and shrinkage properties. J. Build. Eng.97, 110746. 10.1016/j.jobe.2024.110746 (2024). [Google Scholar]
- 37.Jeong, J. H. & Jo, H. Deep reinforcement learning for automated design of reinforced concrete structures. Comput. Civ. Infrastruct. Eng.36 (12), 1508–1529. 10.1111/mice.12773 (2021). [Google Scholar]
- 38.Fu, B., Gao, Y. & Wang, W. A physics-informed deep reinforcement learning framework for autonomous steel frame structure design, Comput. Civ. Infrastruct. Eng., no. May, pp. 3125–3144, (2024). 10.1111/mice.13276
- 39.Cheng, M. & Frangopol, D. M. A Decision-Making framework for load rating planning of aging bridges using deep reinforcement learning. J. Comput. Civ. Eng.35, 1–16. 10.1061/(asce)cp.1943-5487.0000991 (2021). [Google Scholar]
- 40.Van Remmerden, J., Kenter, M., Roijers, D. M. & Andriotis, C. Deep multi-objective reinforcement learning for utility-based infrastructural maintenance optimization. Neural Comput. Appl.010.1007/s00521-024-10954-0 (2025).
- 41.Yang, D. Y. Adaptive Risk-Based Life-Cycle management for Large-Scale structures using deep reinforcement learning and surrogate modeling. J. Eng. Mech.148 (1), 1–15. 10.1061/(asce)em.1943-7889.0002028 (2022). [Google Scholar]
- 42.Han, C., Ma, T. & Chen, S. Asphalt pavement maintenance plans intelligent decision model based on reinforcement learning algorithm. Constr. Build. Mater.299, 124278. 10.1016/j.conbuildmat.2021.124278 (2021). July. [Google Scholar]
- 43.Yao, L., Dong, Q., Jiang, J. & Ni, F. Deep reinforcement learning for long-term pavement maintenance planning. Comput. Civ. Infrastruct. Eng.35 (11), 1230–1245. 10.1111/mice.12558 (2020). [Google Scholar]
- 44.Yao, L., Leng, Z., Jiang, J. & Ni, F. A multi-agent reinforcement learning model for maintenance optimization of interdependent highway pavement networks, Comput. Civ. Infrastruct. Eng., no. February pp. 2951–2970, 2024, (2023). 10.1111/mice.13234
- 45.Zhang, Z., Guo, Z., Zheng, H., Li, Z. & Yuan, P. F. Automated architectural Spatial composition via multi-agent deep reinforcement learning for Building renovation. Autom. Constr.167, 105702. 10.1016/j.autcon.2024.105702 (2024). [Google Scholar]
- 46.Pan, Y., Shen, Y., Qin, J. & Zhang, L. Deep reinforcement learning for multi-objective optimization in BIM-based green building design, Autom. Constr., vol. 166, no. October p. 105598, 2024, (2023). 10.1016/j.autcon.2024.105598
- 47.Lin, C. H., Fu, B., Zhang, L., Li, N. & Tong, G. S. Intelligent design of steel–concrete composite beams based on deep reinforcement learning. Structures70, no.10.1016/j.istruc.2024.107666 (September, 2024).
- 48.Du, Y. & Li, J. A deep reinforcement learning based algorithm for a distributed precast concrete production scheduling, Int. J. Prod. Econ., vol. 268, no. July 2024, (2022). 10.1016/j.ijpe.2023.109102
- 49.Liu, P. et al. Automated clash resolution for reinforcement steel design in precast concrete wall panels via generative adversarial network and reinforcement learning, Adv. Eng. Informatics, vol. 58, Oct. (2023). 10.1016/j.aei.2023.102131
- 50.Kim, T., Kim, Y. W., Lee, D. & Kim, M. Reinforcement learning approach to scheduling of precast concrete production. J. Clean. Prod.336, 130419. 10.1016/j.jclepro.2022.130419 (2022). January. [Google Scholar]
- 51.Lai, L., Dong, Y., Andriotis, C. P., Wang, A. & Lei, X. Synergetic-informed deep reinforcement learning for sustainable management of transportation networks with large action spaces, Autom. Constr., vol. 160, no. April p. 105302, 2024, (2023). 10.1016/j.autcon.2024.105302
- 52.Li, D., Zhao, Y. & Xi, H. A novel two-stage reinforcement learning framework for sustainable Building energy management systems. J. Build. Eng.98, 111475. 10.1016/j.jobe.2024.111475 (2024). October. [Google Scholar]
- 53.Anwar, G. A. & Zhang, X. Deep reinforcement learning for intelligent risk optimization of buildings under hazard. Reliab. Eng. Syst. Saf.247, 110118. 10.1016/j.ress.2024.110118 (2024). [Google Scholar]
- 54.Hosseinzadeh, M., Samadvand, H., Hosseinzadeh, A., Mousavi, S. S. & Dehestani, M. Concrete strength and durability prediction through deep learning and artificial neural networks. Front. Struct. Civ. Eng.10.1007/s11709-024-1124-9 (2024). [Google Scholar]
- 55.Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal Policy Optimization Algorithms, Jul. [Online]. (2017). Available: http://arxiv.org/abs/1707.06347
- 56.Mnih, V. et al. Asynchronous methods for deep reinforcement learning, 33rd Int. Conf. Mach. Learn. ICML, vol. 4, pp. 2850–2869, 2016., vol. 4, pp. 2850–2869, 2016. (2016).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Code Availability: The source code for this study has been made publicly available on GitHub and can be accessed at https://github.com/alirezahzz/Low-Carbon-RC-Beam-Deep-Reinforcement-Learning.


















































