Abstract
In this work we aim to mimic the human ability to acquire the intuition to estimate the performance of a design from visual inspection and experience alone. We study the ability of convolutional neural networks to predict static and dynamic properties of cantilever beams directly from their raw cross-section images. Using pixels as the only input, the resulting models learn to predict beam properties such as volume maximum deflection and eigenfrequencies with 4.54% and 1.43% mean average percentage error, respectively, compared with the finite-element analysis (FEA) approach. Training these models does not require prior knowledge of theory or relevant geometric properties, but rather relies solely on simulated or empirical data, thereby making predictions based on ‘experience’ as opposed to theoretical knowledge. Since this approach is over 1000 times faster than FEA, it can be adopted to create surrogate models that could speed up the preliminary optimization studies where numerous consecutive evaluations of similar geometries are required. We suggest that this modelling approach would aid in addressing challenging optimization problems involving complex structures and physical phenomena for which theoretical models are unavailable.
Keywords: artificial design intuition, shape optimization, biomimetics, deep learning, frequency analysis, surrogate model
1. Goals and motivations
Arriving at an optimal design of a part, component or structure often requires evaluating countless candidate solutions with similar geometries. This is commonly accomplished by performing finite-element analysis (FEA) for each candidate design. FEA is computationally expensive and time consuming, especially when the design problem involves complex geometries and complex nonlinear physics, and when sufficiently accurate theoretical models are unavailable. As finite-element models cannot learn, but rather solve each problem anew, they cannot benefit from prior solutions of geometrically similar problems.
To overcome the computational burden of full FEA during optimization, engineers sometimes use surrogate models trained on sample cases to quickly predict approximate mechanical properties of candidate solutions without requiring a complete analysis. Once established, surrogate models can be used in the early exploratory phase of an optimization process.
Surrogate models are commonly created using machine learning (ML) techniques, allowing them to predict approximate structural properties when trained on examples from a relevant design domain [1–3]. Typically, an ML algorithm receives the relevant physical and geometric design variables as input, and predicts the desired properties as outputs. For example, to predict beam deflection, the model would be provided with design variables such as beam length, material properties and second moments. Using a dataset of input variables and corresponding outputs, an ML model can be trained to represent a function that approximates the resulting beam deflection.
Unfortunately, the relevant physical and geometric input variables for predicting a particular property are often not known in advance, especially when modelling complex phenomena, such as the energy absorption of a vehicle crash box or the lift of a flexing insect wing. On the other hand, it is common knowledge that the second moment is a relevant input variable for predicting a beam’s natural frequencies. In design spaces where both analytical models and knowledge of relevant design variables are lacking, surrogate models that learn the relevant variables would allow us to explore and optimize structures.
This was the motivation behind the present study, as a part of which an ML algorithm was adopted in an attempt to create a surrogate model from raw data inputs (such as image pixels) alone, i.e. without prior identification of the relevant input variables. If successful, the proposed strategy could be extended to tackle other problems for which the relevant input variables are unknown, such as those involving complex, nonlinear interactions: buckling under impact, aero-elastic vibrations or the dynamic behaviour of nano-reinforced micro-structures with complex geometries. For example, the analysis of mechanical properties of micro-scale cantilever beams with carbon nanotube (CNT) reinforcements or structures with embedded CNTs or silica carbide tubes require FEA, since they are impossible to analyse with analytical methods [4,5]. Thus, having a modelling approach that is able to identify the relevant parameters from a visual representation could facilitate the exploration of these design spaces where humans lack visual intuition.
Based on the results presented in this work, we believe that distilling such ‘design intuition’ through deep learning could lead to faster and more automated design optimization, especially for recurring problems that lack precise mathematical formulation.
1.1. Deep neural network models for visual intuition
Deep learning methods, and in particular convolutional neural networks (CNNs), have been successfully used in a wide range of design applications, from generating car silhouettes to predicting molecule structures [6–9]. Furthermore, CNNs have the capacity to model complex phenomena, such as classifying objects in images, recognizing actions in videos and predicting the dynamic behaviour of a lava lamp or a candle flame, as well as to predict the behaviour of another agent in a game of hide and seek [10–12]. However, in most cases, CNN models are trained on raw image data, without access to user-selected relevant variables or analytical models.
As humans, we have a naturally developed non-parametric visual intuition for our environment and the materials and structures that we typically interact with. This allows us to appreciate, for example, that a thinner beam has a lower natural frequency than a thicker beam of the same length, and that the tip of a longer beam will deflect more than that of a shorter beam with the same cross-section. Our intuition for shapes and materials similarly allows us to rule out flawed designs without relying on theoretical knowledge or analytical models.
Hence, the aim of the present study is to determine whether CNNs, when used as surrogate fitness measures for design optimization, can provide a similar visual design intuition. A trained engineer would struggle to guess the first eigenfrequency (f1) of the beam shown in figure 1 solely from seeing its cross-section, even if provided with the information on the beam’s material and geometry. On the other hand, our proposed model, which capitalizes on efficient graphics processing unit (GPU) parallelization, provided a correct answer to this particular question in approximately 2.3 ms with 1.61% mean absolute error (based on the model presented in electronic supplementary material, appendix F–A). To achieve these results, we trained a CNN on cross-sectional images of randomly generated cantilever beams to predict FEA outcomes, such as volume maximum stress, beam deflection and eigenfrequency. The resulting model was subsequently applied as a surrogate fitness measure to identify beam geometries that fulfil predetermined configurations, such as a specific set of eigenfrequencies.
Figure 1.
A twisted cantilever beam that is clamped at one end: the beam’s first three eigenfrequencies (f1, f2 and f3) are shown as they were estimated using FEA, an analytical solution and our trained model. The FEA solution serves as the ground truth.
In sum, the model presented here was trained on previously computed FEA solutions and was effectively used as a surrogate fitness measure for design space exploration. To demonstrate the feasibility of our proposed approach, we developed the necessary modules to generate data, train a CNN model, predict mechanical properties and generate designs for cantilever beams. As a proof of concept, we used our method to identify cantilever beams with a specific set of the first three eigenfrequencies. The main contribution of this study stems from a novel method for analysing and optimizing cantilever beams using a CNN that infers the beam’s mechanical properties from its cross-section.
2. Background
Engineers rely on computational approaches such as finite-element method (FEM) to optimize their designs. During the design process, analysis results guide engineers’ design decisions. This process can be enhanced through FEM as it partitions a complex problem (shape) into a mesh made up of many small sub-problems (shapes) for which solutions can be easily found, and an overarching solution pertaining to the whole shape can be derived. As each mesh element needs to be solved independently, the computational effort required is directly related to the number of mesh components and their respective degrees of freedom [13]. An adequately scaled mesh with the right mesh components is therefore crucial to finding an accurate solution within a reasonable time. To facilitate meshing for common users, modern algorithms generate adaptive meshes that are refined in areas with more complex geometric features, such as curved surfaces or tight corners. Adaptive meshes reduce the computation time while improving the overall result accuracy.
Finding a design that exhibits specific physical behaviours, such as a beam geometry that resonates at a particular set of frequencies or deflects by a predetermined amount given a specific load and constraints, is a trial-and-error process and requires optimization algorithms to complete effectively. These algorithms require fitness measures to evaluate candidate solutions and guide their search. Consequently, optimization speed is mainly determined by the time it takes to evaluate the fitness of a design. Since FEA requires extensive computational resources, it makes optimizations time intensive.
In our approach, this issue is overcome by supplementing FEA with a CNN model that learns from previously evaluated cases to guide future design exploration. The trained model provides an approximate quantitative measure for a design’s performance three orders of magnitude faster than an FEA analysis conducted using the same hardware. Akin to a skilled design engineer, our model learns from prior experience and gradually gains ‘design intuition’ which it can apply when presented with a new design.
Given that training deep neural network models tends to be a data-intensive process, to demonstrate that our model does not suffer from this shortcoming, we provide evidence confirming that it yields reasonable levels of accuracy even when trained on a combined training and validation dataset only 80 data points in size (see §5.3). Furthermore, we show that our cross-section image resolution can be reduced by a factor of 4 while still producing excellent results (see electronic supplementary material, appendix D). Given the performance measures highlighted above and the current advances in the use of CNNs for modelling complex physical phenomena, the approach presented in this work has the potential for use in a wide range of applications, especially those involving more complex geometries.
Neural networks can produce unexpected results when applied to areas for which training data are sparse or absent. Consequently, the engineering community has been justifiably cautious to adopt opaque data-driven models for critical applications, as false models could result in catastrophic outcomes. This risk can be mitigated by clearly defining the search space and ensuring that it is adequately spanned by the dataset. Additionally, it must be noted that models trained on FEA results are inherently less accurate than FEA. To mitigate such concerns, in §5.1, we demonstrate the accuracy of our approach by comparing our model's performance with widely accepted analytical solutions. Finally, we propose trained models as a supplement to FEA. Our approach quickly provides an acceptable solution that can either be verified or perfected using numerical models.
3. Literature review
Surrogate-based optimization has been extensively studied in the engineering literature and has historically relied on classical ML methods rather than deep learning models. Yet, despite the adoption of more powerful ML methods, the implementation steps outlined by Forrester & Keane [1] have remained the same: (i) selection of variables to be optimized; (ii) sampling the design space; (iii) surrogate model selection and training; (iv) exploration of the search space through the surrogate model; and (v) improvements to the surrogate model based on the obtained results or verification of a satisfactory solution against the ground truth. Our proposed approach is based on this same framework, albeit with changes to the surrogate model selection and data representation.
Authors of recent studies recognize the power of deep learning in the context of structural simulation. For example, Rizzo & Caracoglia [14] predicted the critical flutter velocity of suspension bridges using an artificial neural network (ANN). Similarly, Le & Caracoglia [15] used an ANN as a surrogate model of fragility to improve the performance assessment of vertical structures under severe wind conditions. In both cases, adoption of neural networks yielded significant performance improvements over numerical analysis. Although CNNs are mainly used in image classification and segmentation, as well as natural language processing, they have been found to perform better than other types of ANNs in engineering applications. For example, Oh et al. [16] and Zhang et al. [17] demonstrated that CNNs can be effectively employed for structural health monitoring using data in both the time and frequency domains. CNNs can accept vectors of time series data as input in order to learn the relevant features, and then effectively make use of the inherent intertemporal patterns to output the predicted labels.
In their study, Finol et al. [18] used both one-dimensional and two-dimensional CNNs to effectively model eigenvalue problems in mechanics. While an eigenvalue problem for the prediction of the first three eigenfrequencies of cantilever beams is also a focus of the present investigation, unlike Finol et al., we provide our CNN with only a grey-scale image of the beam cross-section, without predefining the relevant input variables. As noted earlier, neural networks can be adopted to effectively model complex systems. However, training them requires considerable data resources, without making any inferences from known laws of physics. Karpatne et al. [19] attempted to overcome these drawbacks by introducing physics-guided neural networks (PGNNs), which not only produce more physically accurate results but also require less data to train. In a similar vein, Zhang et al. [20] proposed a physics-guided convolutional neural network (PhyCNN) for seismic response modelling. In both cases, the proposed models were better at generalizing while helping to counteract overfitting. Moreover, Zhang et al. [20] showed that, by presenting frequency data as images in the frequency domain, these can serve as CNN input. Thus, they applied their PhyCNN as a surrogate model for structural response prediction. In this work, we show that even a less complex CNN, without incorporating laws of physics or sophisticated encodings, can be used as a surrogate fitness model for geometric optimization. As humans, we rely on our inherent ability to recognize shapes in our estimation of the physical properties of our environment. We mimic this capacity by using a bitmap image of the beam cross-section as the representation for our model input. A bitmap image can be kept constant in size while representing any shape independent of the number of vertices used to describe the cross-section.
4. Material and methods
The remainder of this article is organized into four sections: §4.1 Dataset generation, §4.2 Datasets, §4.3 Neural network and §4.4 Optimization algorithm. Thus, after presenting our dataset generation approach, we introduce the datasets used in our experiments. Next, we explain the different neural network configurations explored, as well as the architecture of the model carried forward. Finally, we address the adoption of a random search algorithm for geometric optimization of beam cross-sections in combination with our model as a surrogate fitness measure.
4.1. Dataset generation
Our data generation process comprises random beam generation, static analysis and frequency analysis, for which we adopted the FreeCAD API to generate a computer-aided design (CAD) model of each random beam [21], followed by static and eigenfrequency analysis of each beam using COMSOL Multiphysics v. 5.4 [22]. More detailed information on data generation, as well as our source code, can be found in electronic supplementary material, appendix H.
All beams considered in our simulations were assumed to be manufactured using aluminium with Young’s modulus E = 70 × 109 N mm−2 and density ρ = 2.7 × 10−6 kg mm−3. This was an arbitrary choice, given that the study aim was to demonstrate a CNN-based modelling approach, rather than examine behaviours of cantilever beams. As the goal of the analyses was to automatically process a range of beams, geometric nonlinearity was enabled in the solver to increase the accuracy for outliers that may violate the geometric linearity assumption.
4.1.1. Random beam generation
Our investigation commenced with the generation of a random polygonal cross-section for each beam, which was then copied, scaled, rotated and offset to create linearly extruded, tapered and twisted beams using the loft command. The resulting beams were stored in the dataset as three-dimensional (3D) CAD models, while the original cross-section was stored as a bitmap image. When generating the cross-section shape, the number of vertices, average radius, variance in angular spacing between vertices (irregularity) and variance in radii between vertices of the same shape (spikiness) were provided. The vertices of each cross-section in our datasets have a uniformly randomly sampled radius and number of vertices in the 24–63 mm and 3–30 range, respectively. The polar coordinate angle offsets and the radii of the polygon coordinates were sampled from a uniform distribution and a trimmed normal distribution, respectively. We adjusted the data generation process to minimize meshing errors during the automated FEA. The complete cross-section generation algorithm can be found in electronic supplementary material, appendix B.
4.1.2. Static analysis
For each beam in the dataset, we performed static analysis on a clamped beam with a load at its tip. To automate the data analysis, we made use of the native COMSOL meshing functionality, which generates an adaptive tetrahedral mesh. We automated the static analysis for our datasets in our custom COMSOL app (see electronic supplementary material, appendix I), using the load cases presented in table 1.
Table 1.
Dataset specification table.
| dataset | linearly extruded | 30° twisted | linearly extruded | 50% tapered | 15° twisted | 15° twisted and 50% tapered | 30° twisted and 50% tapered |
|---|---|---|---|---|---|---|---|
| (SlenderBeamDS) | (TwistedBeamDS) | (Linear_DS) | (TA50_DS) | (TW15_DS) | (TW15TA50_DS) | (TW30TA50_DS) | |
| size | 17 501 | 17 501 | 5001 | 5001 | 5001 | 5001 | 5001 |
| twist angle (°) | 0 | 30 | 0 | 0 | 15 | 15 | 30 |
| taper factora | 1 | 1 | 1 | 0.5 | 1 | 0.5 | 0.5 |
| irregularity | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 |
| spikiness | 0.1 | 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.15 |
| load case (N) | [2000, 0, 0] | [1750, 0, 0] | [1750, 0, 0] | [1750, 0, 0] | [1750, 0, 0] | [1750, 0, 0] | [1750, 0, 0] |
aScale factor used to multiply the radii of the bottom cross-section vertices to obtain the top cross-section of the beam.
4.1.3. Frequency analysis
Similar to the approach adopted for static analysis, when performing frequency analysis, we sequentially iterated over each beam in a custom COMSOL application (see electronic supplementary material, appendix I), with the beam clamped at one end (figure 1). For each beam, we computed the first three eigenfrequencies and normalized participation factors (npf).
To avoid inflated eigenfrequency results, the mesh used for FEA needs to be as fine as possible. COMSOL Multiphysics offers nine preset mesh refinement levels, ranging from level 1 (extremely fine) to level 9 (coarse). Through repeated experiments, we found that level 3 (finer) yielded the best trade-off between accuracy and computation time. We found that the computation time quadruples when the mesh level is reduced from 3 to 2, while the increase in accuracy is comparatively small.
4.2. Datasets
Two datasets—SlenderBeamDS and TwistedBeamDS—were initially generated, respectively consisting of beams with linearly extruded cross-sections and beams that are both extruded and twisted. This distinction was made as the deflection and eigenfrequency of the beams comprising the SlenderBeamDS dataset can be determined analytically, whereas a common closed-form solution for twisted beams is not available. To study the influence of beam geometry on model performance, we generated six additional datasets using the first 5001 cross-sections stored in the TwistedBeamDS dataset, each of which contains beams with a different amount of twist and/or taper, as shown in table 1. For example, TW30TA50_DS contains beams with a 30° twist and a 0.5 taper factor, i.e. the cross-section is scaled by 50% over the length of the beam. Thus, beams within the same dataset differ from each other only in their cross-section geometry, as described in electronic supplementary material, appendix G.
4.3. Neural network
As the objective of the present study is to propose a deep learning approach for predicting static and dynamic behaviours of cantilever beams from a single 2D cross-section image, the model is required to infer all other relevant variables, such as beam length or material properties. Thus, to limit the search space, all variables except the cross-section geometry were assumed to be constant for the entire dataset. In the analyses, three neural networks—ConvNet (see electronic supplementary material, appendix A), ConvNet extended and fully connected (see electronic supplementary material, appendix A)—were explored. ConvNet and ConvNet extended differ only in the number of convolutional layers. The fully connected network used in our tests consists of four fully connected layers. We found that the ConvNet extended (figure 2), the deepest CNN we tested, outperformed the other two architectures at predicting the first three eigenfrequencies (see electronic supplementary material, Table I in appendix A). The neural network was programmed in such a way that the output layer dimension changes if the network is trained to predict a different number of properties. For example, if the network were solely predicting the maximum beam deflection, the output layer would have one rather than three dimensions.
Figure 2.
Neural network diagram of ConvNet extended for predicting the first three eigenfrequencies of a cantilever beam.
Unlike classical computer vision methods, CNNs learn the relevant features for predicting the output during training. Each layer is computed from the previous one by sliding a ‘window’ called the kernel over the input tensor and applying the convolution operation. Moreover, as CNNs are able to extract learned features from data, they do not require any feature engineering, as described in detail by LeCun et al. [23] and O’Shea & Nash [24].
All our models were implemented in PyTorch and were trained and run on a desktop computer with Nvidia RTX2080Ti GPUs. Moreover, network size was adjusted based on the input image dimensions (img_size) and the number of labels (number_of_labels) to be predicted.
Each dataset was segregated into training, evaluation and testing subsets in a 64:16:20 ratio. For a dataset comprising 17 500 datapoints, this resulted in a 11 200:2800:3500 split. We trained each network using the Adam optimizer in combination with the mean squared error (MSE) (equation (4.1)) as the loss function [25]. Unless specified otherwise, all models were trained with a learning rate (LR) of 0.0001,
| 4.1 |
4.3.1. ConvNet extended architecture details
The ConvNet Extended network (figure 2) comprises five convolutional layers and two fully connected layers and takes as an anti-aliased grey-scale image of the beam cross-section as input. The first three convolutional layers have 32 5 × 5 convolutions of stride 1 with padding 2, followed by batch normalization. The fourth convolutional layer has 64 5 × 5 convolutions of stride 1 with padding 2, followed by batch normalization and a max pool layer with a 2 × 2 convolution applied at stride 1. Finally, the fifth convolutional layer has 32 5 × 5 convolutions of stride 1 with padding 2, followed by batch normalization and a max pool layer with a 2 × 2 convolution applied at stride 1. All convolutional layers use ReLU activations [26]. The sixth layer is a 32*img_size/4*img_size/4 to 1024 size fully connected layer and the final layer is a 1024 to number_of_labels fully connected layer.
4.4. Optimization algorithm
To assess our model’s effectiveness as a surrogate model, we used a trained model in combination with random search to identify beams that exhibit a specific set of first three eigenfrequencies [f1, f2, f3] and confirmed our findings using FEA.
For the optimization, the best performing model (manually selected from multiple training sessions) was trained on the TwistedBeamDS dataset with a mean average percentage error (MAPE) of 2.03%. The training graph of the model used in this experiment, as well as a table showing a detailed breakdown of the model’s performance, can be found in electronic supplementary material, appendix F–A. During the optimization with the random search algorithm, we randomly generated new beam cross-section vertices using algorithm 2 (see electronic supplementary material, appendix B). These vertices were subsequently used to create a cross-section image, as was done during data generation. After evaluating the resulting cross-section image using our model, we computed the MSE between the desired values for [f1, f2, f3] and the predicted values, saving the best-performing cross-section geometry after each iteration.
5. Results
The results yielded by our analyses are presented under the following four headings: Model performance, Geometric optimization of beam cross sections, Data efficiency and Generalizability. In the section related to model performance, we show that our approach is not only superior to a closed-form solution, but can be applied to scenarios for which no readily available closed-form solution exists. The optimization algorithm section is designated for our experimental set-up and the results yielded by applying our approach to identify specific beam configurations with desired properties. Next, in the Data efficiency section, we discuss the data requirements of our model, and present the results obtained by training the model on various problems and datasets without tuning the relevant network variables in the Generalizability section.
5.1. Model performance
To demonstrate the practical utility of the proposed approach, we compared the accuracy of our model with that of the analytical solution for both volume maximum beam deflection and eigenfrequency. As a closed-form solution for total volume maximum displacement and eigenfrequencies of a twisted cantilever beam is highly complex, the formula for linearly extruded beams was used in comparisons related to both linear and twisted beams [27]. Unlike the analytical solution, our FEA was not constrained by the assumption of geometric linearity. As a result, larger discrepancies between the analytical and the finite-element solution were expected for beams that do not satisfy the geometric linearity criterion,
| 5.1a |
| 5.1b |
| 5.1c |
| 5.1d |
To analytically compute the maximum deflection and the first three eigenfrequencies of each beam, the second moment along the principal axes must be considered, which was computed with respect to the base coordinate frame during data generation. This allowed equations (5.1a)–(5.1d) to be applied to compute the second moment along the principal axes, as well as the z-axis rotation needed to transform the applied force vector into the principal axis coordinate frame.
5.1.1. Total deflection
We solved for the maximum deflection δmax of a clamped cantilever beam subjected to a point load at its tip using equation (5.2), where Fp is the point force applied at the beam tip, Ip is the second moment along the principal axis and E is Young’s modulus, as described in §4.2. The load case considered in this scenario can be found in table 1. A derivation of equation (5.2) is provided by Gere & Goodno [28]. The point force Fp was attained by representing the load case used in the FEA analysis in terms of the eigenbasis of the second moment,
| 5.2 |
As shown in figure 3a, our model is over four percentage points more accurate than the analytical solution for linearly extruded beams. Moreover, when applied to both linearly extruded and twisted beams, our model is approximately 0.2 mm more accurate in predicting volume maximum total displacement (figure 3). One advantage of our approach stems from the assumptions imposed on the analytical solution related to the beam shape and material composition, leading to constraints such as linear bending. Thus, for beams that violate these assumptions, the analytical solution is naturally less accurate. Most importantly, our approach in combination with numerical approximations is a viable strategy for exploring the design spaces of complex problems that lack a known closed-form solution.
Figure 3.
Comparison of errors generated by the analytical solution (blue bar) and the model prediction (orange bar) of total displacement with respect to the ground truth FEA solution based on test data only. (a,b) show the results related to the linearly extruded beams comprising the SlenderBeamDS dataset and the twisted beams included in the TwistedBeamDS dataset, respectively (see table 1 for dataset details). The error bars represent the 95% confidence interval. The graphs confirm that our approach outperforms the analytical solution for free vibration in both scenarios.
5.1.2. First three eigenfrequencies [f1, f2, f3]
The eigenfrequency models used in this experiment were trained with 0.00001 LR and 100 batch size, which were determined based on the validation set performance.
We next solved for the eigenfrequencies using the Euler–Bernoulli equations for free vibration of a cantilever beam using equations (5.3a)–(5.3d) in combination with the root solutions provided in electronic supplementary material, Table II in appendix C. ρ denotes material density, A stands for cross section area and L is the beam length. For a detailed derivation please see Han et al. [29],
| 5.3a |
| 5.3b |
| 5.3c |
| 5.3d |
Our analyses revealed that the first three eigenmodes of the beams in our datasets occur along the two principal axes tangential to the beam length. Thus, we solved equation (5.3c) analytically in both directions and selected the lowest three eigenfrequencies among those found. It is worth noting that computing more than the first three eigenfrequencies may require solutions for all principal directions.
Our findings indicate that the proposed trained model, ConvNetExtended, outperforms the analytical solution for eigenfrequencies when applied to not only twisted beams but also linearly extruded beams, as shown in figure 4. The trained model predicted the total displacement and the first three eigenfrequencies with half the percentage error of the analytical model when applied to linearly extruded beams (see figures 3a and 4a). When applied to twisted beams, the trained model outperformed the analytical solution in terms of evaluating total displacement by over three percentage points (figure 3b). When predicting the first three eigenfrequencies of twisted beams, the trained model outperformed the analytical solution by a small percentage margin, while being significantly more accurate with respect to the mean absolute error (figure 4b). In sum, our approach is not only more accurate than the analytical solution, but also more versatile, as it can be trained on diverse datasets and produces accurate results for various types of beam geometries that may not have an obvious closed-form solution, such as beams that are both tapered and twisted, as demonstrated in §5.4.
Figure 4.
Comparison of errors generated by the analytical solution (blue bar) and the model prediction (orange bar) of total displacement with respect to the ground truth FEA solution based on test data only. (a,b) show the results yielded by applying our model to linearly extruded beams included in the SlenderBeamDS dataset and the twisted beams comprising the TwistedBeamDS dataset, respectively (see table 1 for dataset details). The error bars represent the 95% confidence interval. The graphs show that our model outperforms the analytical solution for free vibration in both scenarios.
5.2. Geometric optimization of beam cross sections
In this section, we study the performance of our models when employed as a surrogate fitness measure in the search for a beam cross-section geometry that exhibits a specific set of the first three eigenfrequencies. For this purpose, we used a model that was trained to predict the first three eigenfrequencies based on the TwistedBeamDS dataset as our surrogate fitness model. Next, a random search algorithm was applied to identify cross-section designs that correspond to twisted beams with a desired set of first three eigenfrequencies. All geometries that met these criteria after a fixed number of evaluations were assessed via FEA and were used to compute the accuracy of our approach.
First, we randomly selected 100 sets of first three eigenfrequencies [f1, f2, f3] from beams in the TwistedBeamDS test set. As our model was trained on the TwistedBeamDS dataset, this strategy guaranteed that a beam that satisfies the desired eigenfrequency configuration exists, while excluding the possibility that the model had previously been presented with this particular set of eigenfrequencies. Second, we ran 10 optimization searches for each set of eigenfrequencies using a random search algorithm. Each search was terminated after 100 000 evaluations. Finally, the identified beam cross-sections were extruded into 3D CAD beams that were analysed in COMSOL Multiphysics to determine their respective true first three eigenfrequencies.
Our findings confirmed that the model developed as a part of this study was effective at guiding the random search algorithm towards beam cross-section geometries that provided a good fit to the target eigenfrequencies. The proposed solutions cumulatively deviated by 8.18 Hz on average over the three target eigenfrequencies, resulting in 2.39% MAPE. The identified cross-section geometries, once extruded with a 30° twist and analysed using FEA, exhibited eigenfrequencies very close to the specified target. The three examples in figure 5 show that the search produced cross-sections that differ in geometry, size and vertex count to attain a beam geometry that best matches the target eigenfrequencies. The samples were manually selected from different frequency ranges to provide insight into the pertinent cross-section geometries.
Figure 5.
Three examples of the beam configurations yielded by the optimization experiment.
We found that on average meshing and analysing one beam in COMSOL Multiphysics takes 4.3 s on an AMD Ryzen Threadripper 2950× CPU using meshing refinement level 3, while evaluating a new beam cross-section using our model on an Nvidia RTX2080Ti GPU was completed within 2.3 ms. As a result, using FEA instead of our trained model for this parameter search would extend the 4-min duration of each 100 000-step optimization to a week. Additionally, we were able to shorten the time required for the 100 optimization runs further by running seven surrogate model instances in parallel on one Nvidia RTX2080Ti. The time comparison between GPU parallelized code and code that runs on the CPU is not an even one, but is used in this instance to show the significant performance difference of the two methods on desktop computer hardware.
Analysing the frequency response and tuning the components to resonate or not resonate at particular frequencies is relevant in many practical applications, from civil engineering to space travel. For example, our approach can be used when working on a mechanical resonator design, as it can identify geometries that vibrate at specific frequencies. Similarly, this approach can speed up the search for geometries that avoid specific eigenfrequencies, for example when designing machine parts that must avoid resonance with motor vibrations.
5.3. Data efficiency: prediction accuracy as a function of dataset size
Thus far, we assessed the performance of our approach using a large dataset. However, for design intuition to be useful, it should be adequately trained with a reasonably small amount of data. In a surrogate-based optimization approach as outlined in §3, the data quantity required to train the model determines its practical utility. Thus, in this section, we show that our model can be trained effectively on relatively small datasets without compromising its high prediction accuracy.
For this purpose, we partitioned the TwistedBeamDS dataset into differently sized subsets, and trained four models on each subset to predict the first three eigenfrequencies. We reserved 20% (i.e. 3500 data points) of the TwistedBeamDS for testing, and computed the mean average percentage error for each model based on the models’ performance on that test set. Each subset was split in a 64:16:20 ratio: 64% of the data was used for training, 16% for validation and the remaining 20% was omitted, since we used the previously selected 3500 data point test set for testing. The dataset size in figure 6 refers to the total number of data points in the subset; assuming a 64:16:20 split, for a subset consisting of 100 samples, 80 would be designated for training and validation.
Figure 6.

Bar graph of the model’s mean absolute percentage error versus the dataset size used to train the model.
The obtained results are presented in figure 6. As indicated by the green trend line, the models trained on a larger dataset generally outperformed those trained on less data. However, it is also apparent that our approach performed less than 2.3 percentage points worse with 8.89% MAPE when trained on 80 data points relative to the full dataset comprising 17 500 data points. These findings confirm that even a small initial dataset is sufficient to provide intuition about the design space, and can be a useful starting point for further design exploration.
5.4. Generalizability: model performance on different labels and beam types
To determine whether our approach can be applied to model various beam geometries, we trained three models on every combination of 12 labels and 7 datasets and reported the results in electronic supplementary material, Table III in appendix E. Based on our previous tests, we found that our approach performs well even if the input image is of low resolution (see electronic supplementary material, appendix D). For this evaluation, we trained the models using images with 64 × 64 pixel resolution to reduce the memory requirements and the overall computation time. The results show the average MAPE and the standard deviation for all three models based on all dataset/label combinations.
Overall, our approach achieved an average of 4.07% MAPE, considering the best performing model for each dataset and label combination. This high level of accuracy was attained without tuning the training-relevant variables of the CNN for the respective labels. This is a particularly impressive achievement considering that it is not possible to deduce from a cross-section image whether a beam is twisted or tapered, or both. Twisted beams tend to bend out of plane, and thus produce different total displacement (TotDisp) results. Similarly, tapered beams exhibit a different frequency profile from linearly extruded beams owing to the difference in mass and shifted centre of mass. Hence, these results confirm that our approach can accurately predict both static and dynamic behaviours of linearly extruded, tapered, twisted, and tapered and twisted cantilever beams.
We also found that our model’s performance suffers when trained on non-normalized labels that span a wide range of values. This is not unusual for ML models and can be addressed by scaling or normalizing the model inputs and outputs. To mitigate this issue, we scaled the values for curl displacement in the y-direction (CurlDisp_Y) and principal strain in the x-direction (PrincStrain_X) and achieved performance comparable to that obtained for other labels (see electronic supplementary material, appendix E). Moreover, even though our model’s performance on volume maximum TotDisp and total von Mises stress (TotVonMises) was sufficiently accurate, careful input and output scaling or normalization would improve the accuracy even further.
Frequency analysis helps engineers to identify eigenfrequencies with significant mass participation to make design decisions or to inform further analysis. Hence, identifying a quantitative measure of an eigenfrequency’s impact on the analysed structure can inform further analyses. For this purpose, we included the root mean square (RMS) of the normalized participation factor (npf1_RMS to npf3_RMS) as a label in our analysis as a proxy measure for the mass participation of each predicted eigenfrequency. Our model successfully predicted the RMS of the normalized participation factor with an average MAPE of 2.11%.
6. Challenges and limitations
We focused our study on eigenfrequencies and maximum deflection to ascertain if our approach is capable of predicting both static and dynamic behaviours of different beam geometries. While we recognize that eigenfrequencies are rarely used in practice in the absence of their corresponding mode shapes, their exploration is beyond the current study, as it requires a neural network architecture that differs significantly from the one presented in this paper.
Since our proposed model cannot be more accurate than the data it was trained on, it is important that the right type of FEA is performed for the problem at hand. At the micro- and nano-scale, cantilever beams behave differently from at the macro-scale [30,31]. Therefore, the correct analysis approach must be used to get accurate data to train a surrogate model and verify optimization results.
Furthermore, many situations require engineers to study more than the first three eigenfrequencies. Although this scenario was not explored in this work, it is likely that the proposed approach is capable of predicting a larger number of eigenfrequencies without significant changes to the model.
Finally, based on the results presented in this paper, it cannot be speculated whether this approach generalizes across other applications, search spaces or geometries. Nonetheless, as it is capable of predicting different beam properties from a variety of beam geometries it has the potential for use in a wide range of applications.
7. Conclusion
We have demonstrated that FEA results can be used to impart visual design intuition to CNNs, and that the resulting models can be used to optimize the shape of a design and to search for specific physical properties. Our model effectively learned implicit physical and geometric characteristics, such as beam length, moment of inertia and material properties. Owing to its data efficiency, it can be incorporated into hybrid surrogate optimization approaches where a search space is randomly sampled to train a model that subsequently serves as a surrogate fitness measure to find better design solutions, which are numerically verified before being included in the dataset to improve the model.
We believe that adoption of our approach can significantly improve productivity in work environments where FEA is frequently performed on similar shapes, or a corpus of design-relevant solved analyses already exists. In such cases, the data pertaining to solved simulations can be used to train CNNs to expedite the work on future design optimization problems. A variety of cloud platforms already provide FEA capabilities to their customers, and have the data to capitalize on this method. By capturing each analysis and the corresponding results, deeper and more complex models with broader applications could be trained and used to guide design explorations.
Most importantly, our approach can be applied to poorly understood processes, as it is capable of implicitly learning to extract and compute the relevant physical and geometric variables from the raw visual data provided, without the need for manual feature engineering. Owing to its simplicity, it can serve not only as a means for furthering design optimization in established fields but also as a tool for exploring poorly understood yet important new design spaces. Our approach could create models that mimic expert intuition in areas where we lack expertise.
8. Future outlook
While acknowledging that the work presented in this paper had a very narrow focus—cantilever beams—the models employed are capable of implicitly learning relevant variables, such as material, beam length, twist angle, etc. Going forward, we plan to design deep learning models that accept these design-relevant parameters as inputs to inform their predictions. We also aim to improve our approach and increase its ability to generalize by exploring alternative neural network architectures or using conditional neural networks. We believe that supplementing FEA with CNNs will accelerate and expand our ability to explore search spaces and optimize designs.
Ultimately, we would like to apply this approach to optimizing structures with multiple members. These are the six steps that we believe are needed to employ the approach demonstrated in this paper to optimize more complex structures. (i) Identify the design parameters that will be optimized, and define the multi-member structure in terms of it. (ii) Find an appropriate representation that can be used as input for the model. While a cross-section image was sufficient for our application, a more complex structure may require multiple images, a 3D voxel representation or an alternative representation all together. (iii) Create a high-fidelity simulation. (iv) Generate an initial dataset that spans the desired design space, by randomly sampling the design space around the area of interest. (v) Train the CNN surrogate model on the generated dataset. (vi) Validate the found design solution against the high-fidelity simulation. If the model’s predicted solution does not accurately match the simulated solution, then sample more data points at and near the found configuration, add those data points to the generated dataset and return to step (v).
Acknowledgements
We thank Joni Mici and Robert Kwiatkowski for reviewing our work and providing feedback on this manuscript. We also thank Michael Ounsworth for sharing his implementation of a polar coordinate-based polygon generation algorithm, which we integrated into our data generation process; see https://stackoverflow.com/questions/8997099/algorithm-to-generate-random-2d-polygon.
Data accessibility
The appendix to this document is in the electronic supplementary material. Link to data generation repository: https://github.com/ResearchMedia/CantileverBeamDatasetGenerator. Link to neural network repository: https://github.com/ResearchMedia/CantileverNN. The datasets were published on Mendeley Data (doi:10.17632/y3m8xm6kfk.1).
Authors' contributions
P.M.W., experiments and manuscript. H.L., conceptualization and editing.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
Funding
This work has been funded in part by the U.S. Defense Advanced Research Project Agency (DARPA) TRADES grant no. HR0011-17-2-0014. This work was also supported in part by the US National Science Foundation (NSF) AI Institute for Dynamical Systems (dynamicsai.org), grant no. 2112085.
References
- 1.Forrester AI, Keane AJ. 2009. Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45, 50-79. ( 10.1016/j.paerosci.2008.11.001) [DOI] [Google Scholar]
- 2.Yan C, Yin Z, Shen X, Mi D, Guo F, Long D. 2020. Surrogate-based optimization with improved support vector regression for non-circular vent hole on aero-engine turbine disk. Aerosp. Sci. Technol. 96, 105332. ( 10.1016/j.ast.2019.105332) [DOI] [Google Scholar]
- 3.Raponi E, Bujny M, Olhofer M, Aulig N, Boria S, Duddeck F. 2019. Kriging-assisted topology optimization of crash structures. Comput. Methods Appl. Mech. Eng. 348, 730-752. ( 10.1016/j.cma.2019.02.002) [DOI] [Google Scholar]
- 4.Civalek S, Dastjerdi Ö, Akbas SD, Akgöz B. 2021. Vibration analysis of carbon nanotube-reinforced composite microbeams. Math. Methods Appl. Sci. 2020, 1-17. [Google Scholar]
- 5.Civalek Ö, Uzun B, Yaylı MÖ, Akgöz B. 2020. Size-dependent transverse and longitudinal vibrations of embedded carbon and silica carbide nanotubes by nonlocal finite element method. Eur. Phys. J. Plus 135, 1-28. ( 10.1140/epjp/s13360-020-00385-w) [DOI] [Google Scholar]
- 6.Gunpinar E, Coskun UC, Ozsipahi M, Gunpinar S. 2019. A generative design and drag coefficient prediction system for sedan car side silhouettes based on computational fluid dynamics. CAD Comput. Aided Des. 111, 65-79. ( 10.1016/j.cad.2019.02.003) [DOI] [Google Scholar]
- 7.Chang DT. 2019. Probabilistic generative deep learning for molecular design. (https://arxiv.org/abs/1902.05148)
- 8.Carrigan TJ, Dennis BH, Han ZX, Wang BP. 2012. Aerodynamic shape optimization of a vertical-axis wind turbine using differential evolution. ISRN Renew. Energy 2012, 1-16. ( 10.5402/2012/528418) [DOI] [Google Scholar]
- 9.Liu J, Yu F, Funkhouser T. 2018. Interactive 3D modeling with a generative adversarial network. In Proc. 2017 Int. Conf. on 3D Vision, 3DV 2017, Qingdao, China, 10–12 October 2017, pp. 126–134. New York, NY: IEEE.
- 10.Chen B, Hu Y, Kwiatkowski R, Song S, Lipson H. 2020. Robots playing hide and seek: towards machine visual perspective taking. In Proc. 2021 IEEE Int. Conf. on Robotics and Automation (ICRA), 30 May–5 June 2021, Xi'an, China, pp. 13 678–13 685. New York, NY: IEEE.
- 11.Chen B, Vondrick C, Lipson H. 2021. Visual behavior modelling for robotic theory of mind. Sci. Rep. 11, 424. ( 10.1038/s41598-020-77918-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen B, Song S, Lipson H, Vondrick C. 2020. Visual hide and seek. In Proc. of the ALIFE 2020: The 2020 Conf. on Artificial Life, online, 13–18 July 2020, pp. 645-655. ASME. [Google Scholar]
- 13.Okereke M, Keates S. 2018. Finite element mesh generation. In Finite element applications, pp. 165–186. Springer Tracts in Mechanical Engineering. Springer International Publishing.
- 14.Rizzo F, Caracoglia L. 2020. Artificial neural network model to predict the flutter velocity of suspension bridges. Comput. Struct. 233, 106236. ( 10.1016/j.compstruc.2020.106236) [DOI] [Google Scholar]
- 15.Le V, Caracoglia L. 2020. A neural network surrogate model for the performance assessment of a vertical structure subjected to non-stationary, tornadic wind loads. Comput. Struct. 231, 106208. ( 10.1016/j.compstruc.2020.106208) [DOI] [Google Scholar]
- 16.Oh BK, Glisic B, Kim Y, Park HS. 2019. Convolutional neural network-based wind-induced response estimation model for tall buildings. Comput.-Aided Civ. Infrastruct. Eng. 34, 843-858. ( 10.1111/mice.12476) [DOI] [Google Scholar]
- 17.Zhang Y, Miyamori Y, Mikami S, Saito T. 2019. Vibration-based structural state identification by a 1-dimensional convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 34, 822-839. ( 10.1111/mice.12447) [DOI] [Google Scholar]
- 18.Finol D, Lu Y, Mahadevan V, Srivastava A. 2019. Deep convolutional neural networks for eigenvalue problems in mechanics. Int. J. Numer. Methods Eng. 118, 258-275. ( 10.1002/nme.6012) [DOI] [Google Scholar]
- 19.Daw A, Karpatne A, Watkins W, Read J, Kumar V. 2017. Physics-guided neural networks (PGNN): an application in lake temperature modeling. (https://arxiv.org/abs/1710.11431v3 [cs.LG])
- 20.Zhang R, Liu Y, Sun H. 2020. Physics-guided convolutional neural network (PhyCNN) for data-driven seismic response modeling. Eng. Struct. 215, 110704. ( 10.1016/j.engstruct.2020.110704) [DOI] [Google Scholar]
- 21.Riegel J, Mayer W, van Havre Y. FreeCAD. See https://www.freecadweb.org/.
- 22.COMSOL Multiphysics. 2002. COMSOL Multiphysics®, v. 5.4. Stockholm, Sweden: COMSOL AB.
- 23.LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541-551. ( 10.1162/neco.1989.1.4.541) [DOI] [Google Scholar]
- 24.O’Shea K, Nash R. 2015. An introduction to convolutional neural networks. (https://arxiv.org/abs/1511.08458)
- 25.Kingma DP, Ba JL. 2015. Adam: a method for stochastic optimization. In 3rd Int. Conf. on Learning Representations, ICLR 2015—Conf. Track Proc., San Diego, CA, 7--9 May 2015.
- 26.Arora R, Basu A, Mianjy P, Mukherjee A. 2018. Understanding deep neural networks with rectified linear units. In 6th Int. Conf. on Learning Representations, ICLR 2018—Conf. Track Proc. Vancouver, Canada, 30 April--3 May 2018.
- 27.Sinha SK. 2007. Combined torsional-bending-axial dynamics of a twisted rotating cantilever Timoshenko beam with contact-impact loads at the free end. J. Appl. Mech. Trans. ASME 74, 505-522. ( 10.1115/1.2423035) [DOI] [Google Scholar]
- 28.Gere J, Goodno B. 2012. Mechanics of materials, 8th edn, pp. 57–80. Stanford, CT: Cengage Learning. [Google Scholar]
- 29.Han SM, Benaroya H, Wei T. 1999. Dynamics of transversely vibrating beams using four engineering theories. J. Sound Vib. 225, 935-988. ( 10.1006/jsvi.1999.2257) [DOI] [Google Scholar]
- 30.Demir Ç, Civalek Ö. 2017. On the analysis of microbeams. Int. J. Eng. Sci. 121, 14-33. ( 10.1002/mma.7069) [DOI] [Google Scholar]
- 31.Numanoğu HM, Akgöz B, Civalek Ö. 2018. On dynamic analysis of nanorods. Int. J. Eng. Sci. 130, 33-50. ( 10.1016/j.ijengsci.2018.05.001) [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The appendix to this document is in the electronic supplementary material. Link to data generation repository: https://github.com/ResearchMedia/CantileverBeamDatasetGenerator. Link to neural network repository: https://github.com/ResearchMedia/CantileverNN. The datasets were published on Mendeley Data (doi:10.17632/y3m8xm6kfk.1).





